Retrieving all rows from a large dataset into memory can cause out-of-memory errors. When creating a Spark DataFrame, computations are not executed until the collect()
method is invoked.
This allows you to reduce the size of the DataFrame through operations such as filtering or aggregating before bringing them into memory.
As a result, you can manage memory usage more efficiently and avoid unnecessary computations.