Original Features
Column Name Data Type Description Categories Missing Values Comment
id INTEGER Unique numerical identifier for patients. Continuous none  
gender TEXT Patient gender profile male, female, other none "other" is removed due to being a single instance which makes it both statistically insignificant and noise potential.
Tested as statistically insignificant (Chi-Square Test), not included in formal analysis.
age NUMERIC Patient's age at data collection Continuous none Bin to medically accepted (CDC) categories.
hypertension INTEGER Presence or absence of hypertension in patient profile. 0 (no) and 1 (yes) none Change to no hypertension (0) and hypertension (1) for easier interpretation.
heart_disease INTEGER Presence or absence of heart disease in patient profile. 1 (no) and 1 (yes) none Change to no heart disease (0) and heart disease (1) for easier interpretation.
ever_married TEXT Patient's history of marriage no / yes none Change to never married (no) and married (yes) for easier interpretation.
work_type TEXT Patient's job type. private, self-employed, govt_job, children, never worked none Consider standardization if implementing modeling.
residence_type TEXT Patient's residence type urban, rural none Tested as statistically insignificant (Chi-Square Test), not included in formal analysis.
avg_glucose_level NUMERIC Patient's average glucose level Continuous none Bin to medically accepted (CDC) categories.
bmi NUMERIC Patient's bmi profile Continuous 201 null values Bin to medically accepted (CDC) categories.
smoking_status TEXT Patient's smoking history never smoked, smokes, formerly smoked, unknown none Investigate "unknown" to test significance.
stroke INTEGER Patient having had stroke or not ever having stroke.
Target feature.
0 (no stroke) and 1 (stroke) none Keep 0 and 1 for modeling.
Convert to had stroke (1) and no stroke (0) for easier interpretation in data analysis.
Feature Engineered Columns
Column Name Data Type Description Categories Missing Values Comment
age_group TEXT Binned age into CDC standard categories children(0-17), young adult(18-24), adult (25-34), midlife adult (35-44), older adult (45-54), pre-seniors (55-64) none  
hypertension_status TEXT Converted binary (0/1) values to no hypertension / hypertension no hypertension, hypertension none  
heart_disease_status TEXT Converted binary (0/1) values to no heart disease / heart disease no heart disease, heart disease none  
ever_married_status TEXT Converted no/yes to never married / married never married, married none  
stroke_status TEXT Conterted binary (0/1) to no stroke / had stroke no stroke, had stroke none  
glucose_category TEXT Binned continuous data to CDC standard categories hypoglycemic(<70), normal (70-99), pre-diabetic (100-125), diabetic (126-199), high diabetes (200+) none  
bmi_category TEXT Binned continuous data to CDC standard categories underweight, normal weight, overweight, obesity clase 1, obesity class 2, obesity class 3 none