Save Disk Space on Large Datasets with Parquet

Save Disk Space on Large Datasets with Parquet

To save disk space on large datasets, use Parquet files instead of CSV. Because Parquet files are compressed, they take up less space on disk and in memory than uncompressed CSV files.

For a 1 million row, 10 column dataset, storing it as CSV takes about 189.59 MB, while storing it as Parquet takes around 78.96 MB, saving approximately 110.63 MB of storage.

Full code for the comparison.

Search

Related Posts

2 thoughts on “Save Disk Space on Large Datasets with Parquet”

Comments are closed.

Related Posts

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran