World Life Expectancy - Part 2: Exploratory Data Analysis

OBJECTIVE: Conduct exploratory data analysis (EDA) on global life expectancy trends to uncover patterns, correlations, and disparities across countries, years, and socioeconomic factors. This analysis builds on the cleaned dataset prepared in Part 1.

BACKGROUND: Understanding life expectancy requires more than just clean data. It calls for context-driven analysis. This project explores how life expectancy varies by GDP, BMI, development status, and over time. It also investigates which countries saw the greatest or smallest changes in life expectancy over the past 15+ years.

The data cleaning completed in Part 1 was essential for making the dataset suitable for deeper analysis. While this project did not aim to fully answer specific research questions or include visualizations, the cleaned data structure enables future exploration.

This analysis was conducted entirely in SQL, with a focus on demonstrating how key trends and relationships could potentially be explored through queries, aggregations, and statistical breakdowns.

With a reliable dataset in place, the following types of questions could potentially be explored:

  1. How much has life expectancy changed across countries?

  2. What’s the relationship between life expectancy and economic/health factors?

  3. How do developed and developing countries compare?

  4. Are there countries with unusually slow or fast improvements?

TECH STACK:

  1. SQL: All analysis conducted using aggregations, window functions, and filtering

  2. Window Functions: Used SUM() OVER() to analyze cumulative trends in adult mortality

  3. GROUP BY & HAVING Clauses: Applied for segment-level insights and data integrity checks

  4. Ordering & Ranking: Used to surface top/bottom performers in change, GDP, and BMI metrics

PROCESS:

  1. Descriptive Trends:
    ◇ Calculated each country's min, max, and range in life expectancy
    ◇ Ranked countries by greatest and smallest improvements in life expectancy over time
    ◇ Summarized average global life expectancy by year, observing upward trends

  2. Comparative Analysis:
    ◇ Aggregated average life expectancy by development status (Developed vs. Developing)
    ◇ Evaluated correlations between GDP and life expectancy, showing positive economic-health links
    ◇ Assessed relationship between BMI and life expectancy, highlighting extremes

  3. Cumulative Metrics:
    ◇ Used window function to compute rolling adult mortality totals by country over time

KEY INSIGHTS:

While this project did not perform in-depth statistical validation, the following trends were observed through exploratory queries:

  • Global Growth: Most countries have experienced measurable improvements in life expectancy since 2000

  • Top Improvers: Certain countries showed gains of over 20 years in life expectancy

  • Low Variance: Some countries had minimal change, prompting further investigation into health systems or stability

  • Economic Correlation: Countries with higher average GDPs tend to have higher life expectancy

  • BMI Patterns: Nations with extreme average BMI levels often show distinctive life expectancy trends

  • Development Disparity: Developed countries still hold a significant life expectancy lead over developing nations

CHALLENGES & SOLUTIONS:

⚠️ Challenge 1: Raw data contained outliers and zero values that distorted aggregates
Solution: Used HAVING clauses to exclude invalid records from analysis

⚠️ Challenge 2: Needed to track change over time at a granular level
Solution: Applied MIN(), MAX(), and difference calculations grouped by country

⚠️ Challenge 3: Cumulative adult mortality trends were not visible in static aggregates
Solution: Leveraged SUM() OVER(PARTITION BY Country ORDER BY Year) to track trends year by year

DATA SOURCES:

  • Cleaned Dataset: world_life_expectancy_cleaned.sql

  • Source: Derived from raw global life expectancy data used in Part 1

  • Timeframe: Covers multi-decade life expectancy, GDP, BMI, and adult mortality records by country

DATA DICTIONARY:

  • Country: Nation or territory name

  • Year: Observation year

  • Life expectancy: Average years a newborn is expected to live

  • GDP: Gross domestic product per capita

  • BMI: Average body mass index by country and year

  • Adult Mortality: Death rate of adults (typically ages 15–60) per 1,000 population

  • Status: Country classification (Developed or Developing)

VIEW THE SQL ANALYSIS CODE:

Want to explore how the analysis was built in SQL?

👉 Click here to view the full SQL exploratory analysis code — includes trend detection, country ranking, correlation breakdowns, and rolling mortality insights.

This code demonstrates a structured, data-driven approach to uncovering meaningful patterns in global health.

The GitHub repository is well-documented, query-by-query, to support reproducibility and transparency.

Previous
Previous

World Life Expectancy - Part 1: Data Cleaning & Preparation

Next
Next

Breakdowns, Bottlenecks & Buses: A Data-Driven Look at NYC School Transit Delays