☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
-
Updated
Dec 29, 2025 - Python
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Python package implementing ML feature engineering and pre-processing for polars or pandas dataframes.
Agentic & No-Code Data Transformations
A tool to read CSV files with CSVW metadata and transform them into other formats.
Easy ETL
DEPRECATED: YAML-based data transformations
An open-source framework for automated sensitive data classification and adaptive privacy-preserving transformations in data pipelines.
Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform
This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL for Data Loading and Staging ➲Airflow for Data Orchestration and Monitoring ➲PowerBI for Reporting
[VLDB'25] Official repo for Paper "Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation"
fast-resource is a data transformation layer that sits between the database and the application's users, enabling quick data retrieval. It further enhances performance by caching data using Redis and Memcached.
GUI and library made to flatten HUGE JSON files. A library and utility for exploring, analyzing, and flattening JSON files of any size (LARGE - GBs) into CSVs, along with CSV transformations, dynamic CSV filtering, and all with low memory utilization.
This project focuses on analyzing and visualizing the insurance portfolio of an anonymous company that implemented an aggressive growth plan in 2021 across the counties of Florida using Python and Power BI
NuScenes, Lyft, Waymo and a2d2 datasets parser.
A Holistic Platform for Automating Data Preparation
This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.
Python scripts that scrape US gas prices
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."