AI & Analytics

I've just open-sourced MessyData, a synthetic dirty data generator. It lets you programmatically generate data with anomalies and data quality issues.

Reddit r/datascience

Summary

MessyData is a newly released open-source Python tool that allows users to generate synthetic data with anomalies and quality issues. It enables the simulation of realistic data scenarios, including missing values and duplicate records. This makes it a valuable resource for BI professionals looking to test and demonstrate data workflows.

Read the full article