A modular Python project that fetches real‑time weather data, transforms it, stores it in SQLite, and runs SQL analytics — built with a data engineering mindset.
- ETL Pipeline (Extract → Transform → Load)
- Closest‑timestamp matching for humidity & pressure
- SQLite storage with idempotent inserts (
UNIQUE(city, timestamp)+INSERT OR IGNORE) - SQL analytics:
- Latest weather per city
- Temperature ranking (window function)
- Duplicate detection
- CLI menu for easy interaction
- Fully modular architecture
weather/ ├── api.py # API calls (geocoding + weather) ├── transform.py # Data cleaning + timestamp matching ├── db.py # SQLite operations ├── analytics.py # SQL analytics queries ├── pipeline.py # ETL orchestration └── main.py # CLI entry point
Install dependencies:
pip install requests
Start the CLI:
python main.py
Menu options:
- Run ETL (fetch & save weather)
- Show latest weather
- Show ranked temperatures
- Show duplicates
- Exit
- Idempotent ETL (safe to rerun)
- UNIQUE constraints for data integrity
- INSERT OR IGNORE to prevent duplicates
- Window functions for ranking
- Subqueries for latest‑record selection
- Modular pipeline design
requests
MIT License.