| Business Problem: To identify the leading predictors of stroke in people under the age of 65, enabling public health organizations and care providers to develop targeted early prevention strategies, awareness campaigns, and screening protocols tailored to non-elderly populations. Business Tasks: 1. Quantify stroke occurrence in patients under 65. 2. Identify the strongest predictors of stroke in this population. 3. Understand how age interacts with key risk factors like hypertension, glucose level, and smoking. 4. Pinpoint the age at which intervention is most impactful for each condition. 5. Support public health planning with actionable recommendations for early screening and prevention. Target Audiences: 1) Public health officials and preventative care organizations. 2) Health-focused nonprofits and awareness campaign designers. 3) Recruiters and hiring managers seeking data analysts with strong SQL, Excel, Tableau, and data storytelling skills. Why Under 65: 1) Studies have shown that age is a dominant predictor of stroke risk, but younger adults also experience strokes and are often overlooked in aggregate statistics that include older patients. 2) Early identification may reduce long-term disability, cost of care, and workforce disruption. 3) See under65_context for justification on excluding 65+ patients based on distribution skew and stroke prevalence. Core Questions: 1) What demographic or health-related factors are most strongly associated with stroke in people under 65? 2) How does age interact with each top risk factor? 3) What data-driven insights can support targeted outreach to younger patients at risk? Deliverables: 1) Clean and filtered SQL dataset focused on under-65 patients. 2) A short, well-documented EDA summary with insights and key visuals. 3) A Tableau Public dashboard showcasing top findings. 4) A recruiter-facing README summary to showcase analysis process and results. |
||||||||||||||
| Step | Tool | Purpose | ||||||||||||
| Data cleaning and filtering | PostgreSQL | Extract under-65 subset, clean nulls, bin ages, filter outliers. | ||||||||||||
| Exploratory Data Analysis (EDA) | PostgreSQL, Excel, and Python | Run descriptive queries, summarize trends, calculate percentages. | ||||||||||||
| Data visualization | Tableau + Excel | Create clear, presentation-ready visuals for insights and stakeholder communication. | ||||||||||||
| Version control | Git, GitHub, Excel (Save as) | Maintain reproducibility and track changes throughout the project. | ||||||||||||
| Documentation | Excel | Log all cleaning and EDA steps, with reasoning and insights tracking. | ||||||||||||
| Presentation | Slide deck + video presentation | Communicate key findings and recommendations clearly to non-technical stakeholders (5-10 mins). | ||||||||||||
| Project write-up | README (Github) | Summarize goals, methods, insights, visuals, and tools for recruiters. | ||||||||||||