| Original Features | ||||||
| Column Name | Data Type | Description | Categories | Missing Values | Comment | |
| id | INTEGER | Unique numerical identifier for patients. | Continuous | none | ||
| gender | TEXT | Patient gender profile | male, female, other | none | "other"
is removed due to being a single instance which makes it both statistically
insignificant and noise potential. Tested as statistically insignificant (Chi-Square Test), not included in formal analysis. |
|
| age | NUMERIC | Patient's age at data collection | Continuous | none | Bin to medically accepted (CDC) categories. | |
| hypertension | INTEGER | Presence or absence of hypertension in patient profile. | 0 (no) and 1 (yes) | none | Change to no hypertension (0) and hypertension (1) for easier interpretation. | |
| heart_disease | INTEGER | Presence or absence of heart disease in patient profile. | 1 (no) and 1 (yes) | none | Change to no heart disease (0) and heart disease (1) for easier interpretation. | |
| ever_married | TEXT | Patient's history of marriage | no / yes | none | Change to never married (no) and married (yes) for easier interpretation. | |
| work_type | TEXT | Patient's job type. | private, self-employed, govt_job, children, never worked | none | Consider standardization if implementing modeling. | |
| residence_type | TEXT | Patient's residence type | urban, rural | none | Tested as statistically insignificant (Chi-Square Test), not included in formal analysis. | |
| avg_glucose_level | NUMERIC | Patient's average glucose level | Continuous | none | Bin to medically accepted (CDC) categories. | |
| bmi | NUMERIC | Patient's bmi profile | Continuous | 201 null values | Bin to medically accepted (CDC) categories. | |
| smoking_status | TEXT | Patient's smoking history | never smoked, smokes, formerly smoked, unknown | none | Investigate "unknown" to test significance. | |
| stroke | INTEGER | Patient
having had stroke or not ever having stroke. Target feature. |
0 (no stroke) and 1 (stroke) | none | Keep
0 and 1 for modeling. Convert to had stroke (1) and no stroke (0) for easier interpretation in data analysis. |
|
| Feature Engineered Columns | ||||||
| Column Name | Data Type | Description | Categories | Missing Values | Comment | |
| age_group | TEXT | Binned age into CDC standard categories | children(0-17), young adult(18-24), adult (25-34), midlife adult (35-44), older adult (45-54), pre-seniors (55-64) | none | ||
| hypertension_status | TEXT | Converted binary (0/1) values to no hypertension / hypertension | no hypertension, hypertension | none | ||
| heart_disease_status | TEXT | Converted binary (0/1) values to no heart disease / heart disease | no heart disease, heart disease | none | ||
| ever_married_status | TEXT | Converted no/yes to never married / married | never married, married | none | ||
| stroke_status | TEXT | Conterted binary (0/1) to no stroke / had stroke | no stroke, had stroke | none | ||
| glucose_category | TEXT | Binned continuous data to CDC standard categories | hypoglycemic(<70), normal (70-99), pre-diabetic (100-125), diabetic (126-199), high diabetes (200+) | none | ||
| bmi_category | TEXT | Binned continuous data to CDC standard categories | underweight, normal weight, overweight, obesity clase 1, obesity class 2, obesity class 3 | none | ||