The Traditional Way
Traditional database systems require a predefined table schema and a subsequent data import process when working with CSV data. This can be a tedious and time-consuming process.
To demonstrate this, let’s create a CSV file called customer.csv
.
import pandas as pd
# Create a sample dataframe
data = {
"name": ["Alice", "Bob", "Charlie", "David", "Eve"],
"age": [25, 32, 45, 19, 38],
"city": ["New York", "London", "Paris", "Berlin", "Tokyo"],
}
df = pd.DataFrame(data)
# Save the dataframe as a CSV file
df.to_csv("customers.csv", index=False)
To load this CSV file in Postgres, you need to run the following query:
-- Create the table
CREATE TABLE customers (
name VARCHAR(100),
age INT,
city VARCHAR(100)
);
-- Load data from CSV
COPY customers
FROM 'customers.csv'
DELIMITER ','
CSV HEADER;
The DuckDB Way
In contrast, DuckDB allows for direct reading of CSV files from disk, eliminating the need for explicit table creation and data loading.
import duckdb
duckdb.sql("SELECT * FROM 'customers.csv'")
┌─────────┬───────┬──────────┐
│ name │ age │ city │
│ varchar │ int64 │ varchar │
├─────────┼───────┼──────────┤
│ Alice │ 25 │ New York │
│ Bob │ 32 │ London │
│ Charlie │ 45 │ Paris │
│ David │ 19 │ Berlin │
│ Eve │ 38 │ Tokyo │
└─────────┴───────┴──────────┘
By using DuckDB, you can simplify your data import process and focus on analyzing your data.
Installation
To use DuckDB, you can install it using pip:
pip install duckdb