To save disk space on large datasets, use Parquet files instead of CSV. Because Parquet files are compressed, they take up less space on disk and in memory than uncompressed CSV files.
For a 1 million row, 10 column dataset, storing it as CSV takes about 189.59 MB, while storing it as Parquet takes around 78.96 MB, saving approximately 110.63 MB of storage.
2 thoughts on “Save Disk Space on Large Datasets with Parquet”
This is so useful.
Thank you!
Comments are closed.