Simplify CSV Data Management with DuckDB

The Traditional Way

Traditional database systems require a predefined table schema and a subsequent data import process when working with CSV data. This can be a tedious and time-consuming process.

To demonstrate this, let’s create a CSV file called customer.csv.

import pandas as pd

# Create a sample dataframe
data = {
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "age": [25, 32, 45, 19, 38],
    "city": ["New York", "London", "Paris", "Berlin", "Tokyo"],
}

df = pd.DataFrame(data)

# Save the dataframe as a CSV file
df.to_csv("customers.csv", index=False)

To load this CSV file in Postgres, you need to run the following query:

-- Create the table
CREATE TABLE customers (
    name VARCHAR(100),
    age INT,
    city VARCHAR(100)
);

-- Load data from CSV
COPY customers
FROM 'customers.csv'
DELIMITER ','
CSV HEADER;

The DuckDB Way

In contrast, DuckDB allows for direct reading of CSV files from disk, eliminating the need for explicit table creation and data loading.

import duckdb

duckdb.sql("SELECT * FROM 'customers.csv'")
┌─────────┬───────┬──────────┐
│  name   │  age  │   city   │
│ varchar │ int64 │ varchar  │
├─────────┼───────┼──────────┤
│ Alice   │    25 │ New York │
│ Bob     │    32 │ London   │
│ Charlie │    45 │ Paris    │
│ David   │    19 │ Berlin   │
│ Eve     │    38 │ Tokyo    │
└─────────┴───────┴──────────┘

By using DuckDB, you can simplify your data import process and focus on analyzing your data.

Installation

To use DuckDB, you can install it using pip:

pip install duckdb 

Link to DuckDB.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran