SQL Data
feature category stroke_status count Expected Values Chi-Square Test
ever_married married had stroke 81 Sum of count Column Labels had stroke no stroke feature p-value significant
ever_married married no stroke 2333 Row Labels had stroke no stroke Grand Total married 53.22390985 2360.77609 ever_married 1.7144E-09 yes
ever_married never married had stroke 9 ever_married 90 3992 4082 never married 36.77609015 1631.22391 gender 0.3238055 no
ever_married never married no stroke 1659 married 81 2333 2414       heart_disease 1.93107E-14 yes
gender female had stroke 48 never married 9 1659 1668 female 52.56246938 2331.437531 hypertension 1.40644E-05 yes
gender female no stroke 2336 gender 90 3992 4082 male 37.43753062 1660.562469 residence_type 0.569317433 no
gender male had stroke 42 female 48 2336 2384       smoking_status 0.000122057 yes
gender male no stroke 1656 male 42 1656 1698 heart disease 2.116609505 93.88339049 work_type 0.004206256 yes
heart_disease heart disease had stroke 13 heart_disease 90 3992 4082 no heart disease 87.88339049 3898.11661
heart_disease heart disease no stroke 83 heart disease 13 83 96       The Chi-Square Test indicates that for patients under 65, gender and residence_type are not statistically significant predictors of stroke, as their p-values exceed the 0.05 significance threshold.

Gender and residence_type features will be
dropped from the analysis, moving forward.
heart_disease no heart disease had stroke 77 no heart disease 77 3909 3986 hypertension 5.908868202 262.0911318
heart_disease no heart disease no stroke 3909 hypertension 90 3992 4082 no hypertension 84.0911318 3729.908868
hypertension hypertension had stroke 16 hypertension 16 252 268      
hypertension hypertension no stroke 252 no hypertension 74 3740 3814 rural 44.66927976 1981.33072
hypertension no hypertension had stroke 74 residence_type 90 3992 4082 urban 45.33072024 2010.66928
hypertension no hypertension no stroke 3740 rural 42 1984 2026      
residence_type rural had stroke 42 urban 48 2008 2056 formerly smoked 12.89808917 572.1019108
residence_type rural no stroke 1984 smoking_status 90 3992 4082 never smoked 32.74130328 1452.258697
residence_type urban had stroke 48 formerly smoked 22 563 585 smokes 14.52964233 644.4703577 Mann-Whitney Test
residence_type urban no stroke 2008 never smoked 24 1461 1485 unknown 29.83096521 1323.169035 feature p-value
smoking_status formerly smoked had stroke 22 smokes 25 634 659       bmi 1.06E-06
smoking_status formerly smoked no stroke 563 unknown 19 1334 1353 children 15.14698677 671.8530132 avg_glucose_level 0.0001
smoking_status never smoked had stroke 24 work_type 90 3992 4082 govt_job 11.59725625 514.4027438
smoking_status never smoked no stroke 1461 children 2 685 687 never_worked 0.485056345 21.51494366 The Mann–Whitney U Test was conducted using Python, as Excel does not support this test natively.

The script can be found in the python_stat_test folder under the filename mann_whitney_test.py.

Both BMI and average glucose level returned p-values
significantly below the 0.05 threshold, indicating that they are statistically significant predictors of stroke in patients under 65.
smoking_status smokes had stroke 25 govt_job 16 510 526 private 53.13571779 2356.864282
smoking_status smokes no stroke 634 never_worked 22 22 self-employed 9.634982852 427.3650171
smoking_status unknown had stroke 19 private 59 2351 2410
smoking_status unknown no stroke 1334 self-employed 13 424 437
work_type children had stroke 2 Grand Total 630 27944 28574
work_type children no stroke 685
work_type govt_job had stroke 16
work_type govt_job no stroke 510
work_type never_worked no stroke 22
work_type private had stroke 59
work_type private no stroke 2351
work_type self-employed had stroke 13
work_type self-employed no stroke 424