Data Science

Data Science with Python


Beginner → Intermediate

Learn practical data analysis, visualization, and storytelling using Python — master pandas, NumPy, and Matplotlib through real projects and dashboards.


Data Science with Python: From Beginner to Analyst — Learn, Analyze, and Visualize.
Step-by-step, project-driven course to help you move from data basics to business-ready dashboards and forecasts.

Duration: 8–10 weeks · 

Format: Hands-on notebooks · 

Level: Beginner → Intermediate


Why this course

  • Practical learning path. Build confidence in Python through real-world datasets and structured projects.
  • Python-first stack. Master pandas, NumPy, Matplotlib, Seaborn, Plotly, statsmodels, and basics of scikit-learn.
  • Learn by doing. Each module includes guided notebooks, exercises, and mini-projects.
  • Portfolio-ready capstone. Create a business dashboard and predictive model (sales or churn) to showcase your skills.

Who this course is for

  • Beginners who want to learn data analysis with Python from scratch
  • Students preparing for data analytics and data science roles
  • Junior analysts ready to move from Excel to Python
  • Anyone curious about turning data into insights and stories

Learning Outcomes

By the end of the course, you’ll be able to:

  • Load, clean, and manipulate data using pandas and NumPy.
  • Perform EDA (Exploratory Data Analysis) with meaningful visualizations.
  • Apply statistical methods and simple regression for insight generation.
  • Analyze time series and forecast business metrics.
  • Build interactive dashboards to present and explain findings.
  • Complete a capstone project: predictive model + interactive business dashboard.

Course Modules

Module 0: Data Science in the Real World

Outcome:
Learners understand why each skill matters.


PHASE 1 — Core Technical Foundations (Weeks 1–2)

Module 1: Python for Data Analysis

Lab: Analyze a real business dataset (10k+ rows)


Module 2: SQL for Data Analysts (Lite — Functional Level)


Module Goal: By the end of this module, learners can query a real database confidently, pull data into Python, and combine SQL with pandas — enough to work as a junior data analyst without needing a separate SQL course yet.

💡 This module covers SQL at a functional analyst level. If you want to go deeper into advanced SQL, database design, and query optimization — stay tuned for our dedicated SQL for Analysts course.”

Topics:

  1. Why SQL for Data Analysts?
    • ∙ SQL vs Excel vs Python — when to use what
    • ∙ How companies store data (tables, databases, schemas — conceptual only)
    • Setting up SQLite + converting Superstore CSV to a database
  2. Your First Queries — SELECT, WHERE, ORDER BY
    • SELECT specific columns
    • DISTINCT and LIMIT
    • Filtering with WHERE (=, >, <, BETWEEN, LIKE)
    • Sorting with ORDER BY
  3. Aggregations — Summarising Data
    • COUNT, SUM, AVG, MIN, MAX
    • GROUP BY
    • Filtering groups with HAVING
    • Difference between WHERE and HAVING
  4. Combining Tables — JOINs
    • What is a JOIN and why it exists
    • INNER JOIN
    • LEFT JOIN
    • Handling NULLs that appear after a JOIN
  5. Subqueries — Queries Inside Queries (Basic)
    • Subquery in WHERE clause
    • Subquery in FROM clause
    • When to use a subquery vs a JOIN (conceptual — not exhaustive)
  6. SQL Meets Python
    • Loading SQLite into Python with sqlite3
    • Running queries with pd.read_sql()
    • When to query in SQL vs filter in pandas
    • Exporting query results to a DataFrame for further analysis

Lab: 10 Guided Queries on the Superstore Database

Mini Project:
Analyze a business database using SQL + Python.

Want to master SQL fully? Check out our SQL for Analysts course


PHASE 2 — Data Cleaning & Exploration (Weeks 3–4)

Module 3: Data Cleaning & Wrangling (Real-World Data Preparation)

Objective: Help learners transform raw, messy data into clean, structured, and analysis-ready datasets, while building a strong foundation in real-world data preprocessing workflows.

  1. Data Cleaning Foundations & Real-World Data Issues
    • What data cleaning actually means in practice
    • Types of real-world data issues (missing values, duplicates, incorrect formats)
    • Why “clean data” is essential for reliable insights
    • The data cleaning workflow (overview before diving deep)
  2. Handling Missing Values (Data Gaps & Decisions)
    • Types of missing data (MCAR, MAR, MNAR — intuitive understanding)
    • Detecting missing values in pandas
    • Evaluating impact (how much missing data is too much?)
    • Strategies:
      • Dropping data
      • Imputation (mean, median, mode)
      • Forward/backward fill
      • Conditional filling
    • When missing data itself is meaningful
  3. Data Types & Conversions (Structuring Data Correctly)
    • Understanding data types (numeric, categorical, datetime, boolean)
    • Checking and interpreting data types in pandas
    • Common issues:
      • Numbers stored as strings
      • Incorrect date formats
      • Mixed data types in a column
    • Converting data types:
      • astype() for basic conversions
      • to_datetime() for dates
      • to_numeric() for numeric data
    • Handling conversion errors and invalid values
  4. Data Manipulation — Filtering, Grouping & Merging
    • Filtering data using conditions
    • Selecting and organizing relevant columns
    • Sorting data for better interpretation
    • Grouping and aggregation (groupby)
    • Transforming data using group-level context
    • Merging datasets (joins: inner, left, etc.)
    • Concatenation for combining datasets
  5. Feature Engineering & Data Transformation
    • What feature engineering is and why it matters
    • Creating new features (mathematical, ratio-based)
    • Working with date and time features
    • Encoding categorical variables
    • Binning and segmentation
    • Scaling and normalization
    • Handling skewed data
    • Interaction features and feature selection basics

Lab: End-to-End Data Cleaning & Preparation Workflow

Focus: Applying everything in a real scenario

This lab ties all concepts together through a practical dataset.

Includes:

  • Identifying data quality issues
  • Handling missing values
  • Cleaning and structuring data
  • Performing filtering and grouping
  • Creating new features
  • Preparing a final analysis-ready dataset

Module 4: Exploratory Data Analysis (EDA)

  1. Introduction to EDA
    • What is EDA in real-world workflows
    • Analyst vs data scientist thinking
    • Asking the right questions
    • From data → insights → decisions
  2. Descriptive Statistics
    • Mean, median, mode
    • Variance, standard deviation
    • Percentiles
    • When averages mislead
  3. Distribution Analysis
    • Normal vs skewed distributions
    • Histograms, KDE plots
    • Skewness & kurtosis (intuitive)
    • Real-world interpretation
  4. Correlation vs Causation
    • Correlation basics
    • Correlation matrix
    • Heatmaps
    • Why correlation ≠ causation
    • Confounding variables
  5. Outlier Detection (EDA Perspective)
    • Outliers as signals, not just noise
    • Visual detection (boxplot, scatter)
    • Business interpretation
    • When to keep vs investigate
  6. Segment-Based EDA
    • Group analysis (groupby)
    • Comparing segments (region, customer type)
    • Cohort-style thinking (intro level)
    • Finding hidden patterns
  7. Transformation Techniques (EDA Context)
    • Log transformation for skew
    • Binning (creating categories)
    • Scaling intuition (basic)
    • Feature creation for analysis
  8. EDA Workflow & Checklist Framework
    • Step-by-step EDA process
    • What to check first
    • Common pitfalls
    • Reusable checklist

Lab: Customer Behavior Analysis

👉 Output:

  • Charts
  • Observations
  • Business conclusions

PHASE 3 — Visualization, Communication & Statistics (Weeks 5–6)

Module 5: Data Visualization & Business Storytelling

  • Matplotlib & Seaborn
  • Business chart selection
  • Dashboard design principles
  • KPI definition
  • Executive storytelling
  • Avoiding misleading visuals

Mini Project:
Insight-driven executive dashboard.


Module 6: Statistics for Decision-Making

  • Probability intuition
  • Confidence intervals
  • Hypothesis testing
  • A/B testing
  • Bootstrapping
  • Common statistical mistakes

Lab:
Analyze experimental business data.


PHASE 4 — Predictive Modeling Foundations (Weeks 7–8)

Module 7: Regression Modeling

  • Linear & multiple regression
  • Assumptions & diagnostics
  • Residual analysis
  • Multicollinearity (VIF)
  • Regularization (Ridge, Lasso)

Mini Project:
Predict revenue or demand.


Module 8: Classification Models

  • Logistic regression
  • Decision trees
  • Random forests
  • Feature importance
  • Confusion matrix
  • Precision–recall tradeoffs
  • Imbalanced datasets

Mini Project:
Customer churn or risk prediction.


PHASE 5 — Model Validation & Forecasting (Weeks 9–10)

Module 9: Model Validation & Optimization

  • Train/test vs cross-validation
  • Bias–variance tradeoff
  • Grid vs random search
  • ROC & AUC
  • Model selection frameworks

Module 10: Time Series & Forecasting

  • Trend & seasonality
  • Rolling statistics
  • Stationarity intuition
  • Time-aware splits
  • ARIMA (conceptual)
  • Prophet overview
  • Forecast evaluation (MAPE)

Lab:
Sales or demand forecasting.


PHASE 6 — Capstone & Career Launch (Weeks 11–12)

Module 11: Capstone Project (Major Differentiator)

Choose One:

  • Sales forecasting system
  • Customer churn prediction system

Deliverables:

  • Cleaned dataset + EDA notebook
  • Validated predictive model
  • Interactive dashboard (Streamlit / Plotly)
  • GitHub repository
  • README documentation
  • 2–3 page executive business brief

Course Format & Assessment

  • Guided labs: Weekly coding notebooks.
  • Mini projects: Practical exercises after each module.
  • Peer feedback: Optional code reviews.
  • Final project: Dashboard + predictive model submission.

Prerequisites

  • No prior coding or math background required.
  • Basic computer literacy and willingness to learn by doing.

Pricing & Enrollment Options

  • Self-paced: Lifetime access + community.
  • Cohort-based (optional): Live Q&A and feedback sessions.
  • Certificate: Earn a verified certificate to showcase your achievement.

FAQ

Q: Is this course beginner-friendly?
A: Yes! It starts from Python basics and gradually builds to intermediate projects.

Q: What tools will I learn?
A: pandas, NumPy, Matplotlib, Seaborn, Plotly/Streamlit, statsmodels, and scikit-learn basics.

Q: What’s the final project?
A: A sales or churn prediction dashboard built using real-world data.

Q: How long will it take to complete?
A: Typically 8–10 weeks at 4–6 hours per week.


Ready to start your data journey?
Enroll Now • Preview Free Lesson