Skip to content

jabrankhawaja/weather-etl-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌦️ Weather ETL & Analytics Pipeline

A modular Python project that fetches real‑time weather data, transforms it, stores it in SQLite, and runs SQL analytics — built with a data engineering mindset.


🚀 Features

  • ETL Pipeline (Extract → Transform → Load)
  • Closest‑timestamp matching for humidity & pressure
  • SQLite storage with idempotent inserts (UNIQUE(city, timestamp) + INSERT OR IGNORE)
  • SQL analytics:
    • Latest weather per city
    • Temperature ranking (window function)
    • Duplicate detection
  • CLI menu for easy interaction
  • Fully modular architecture

📁 Project Structure

weather/ ├── api.py # API calls (geocoding + weather) ├── transform.py # Data cleaning + timestamp matching ├── db.py # SQLite operations ├── analytics.py # SQL analytics queries ├── pipeline.py # ETL orchestration └── main.py # CLI entry point


▶️ Run the Project

Install dependencies:

pip install requests

Start the CLI:

python main.py

Menu options:

  • Run ETL (fetch & save weather)
  • Show latest weather
  • Show ranked temperatures
  • Show duplicates
  • Exit

🧠 Data Engineering Concepts Used

  • Idempotent ETL (safe to rerun)
  • UNIQUE constraints for data integrity
  • INSERT OR IGNORE to prevent duplicates
  • Window functions for ranking
  • Subqueries for latest‑record selection
  • Modular pipeline design

📌 Requirements

requests


📄 License

MIT License.

About

A modular Python ETL pipeline that fetches real‑time weather data, transforms it, stores it in SQLite, and runs SQL analytics using window functions and idempotent loading. Includes a clean CLI menu and production‑style project structure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages