Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

TimescaleDB: Time-Series Superpowers for PostgreSQL

TimescaleDB: Time-Series Superpowers for PostgreSQL

Table of Contents

Introduction

PostgreSQL is optimized for transactional workloads where individual rows are read and updated. Common operations include updating user profiles, processing payments, or changing order states. Access is random and row-focused.

PostgreSQL: designed for row-level access

┌────┐
│ r1 │ ◄── read/update
├────┤
│ r2 │
├────┤
│ r3 │ ◄── delete
├────┤
│ r4 │
├────┤
│ r5 │ ◄── read/update
└────┘

Time-series data is the opposite: you constantly append new readings, query by time range, and periodically delete old data in bulk. Instead of touching individual rows, every operation works on ranges of time.

Time-series: needs range-level access

┌────┬────┬────┬────┐
│ t1 │ t2 │ t3 │ t4 │ ◄── append
├────┼────┼────┼────┤
│    │    │    │    │
│    query range    │
│    ◄──────────►   │
├────┼────┼────┼────┤
│ t1 │ t2 │         │
│ ▼  │ ▼  │         │
│drop│drop│         │
└────┴────┴─────────┘
     ▲
     └── bulk delete old chunks

PostgreSQL can handle this pattern at smaller scales, but time-series tables don’t remain small. They grow continuously, and operations that were efficient early on becomes less efficient over time.

As datasets scale into the hundreds of millions of rows, inserts, queries, and deletes all start to slow down.

Why PostgreSQL Struggles with Time-Series Data

Writes: Every Insert Updates Every Index

As a PostgreSQL table grows, inserts get progressively slower because every insert does two things:

  • Append the row to the table, which stores the actual data in insertion order (fast)
  • Update every index by finding the correct sorted position for the new value, so queries can locate rows without scanning the full table (gets slower as the index grows)

For example, inserting a sensor reading at 10:30:00 triggers both steps:

INSERT INTO sensor_data VALUES ('10:30:00', 'sensor_3', 22.5)

Step 1: Append row to table
┌───────────┬──────────┬──────┐
│ 10:00:01  │ sensor_1 │ 21.3 │
│ 10:00:05  │ sensor_2 │ 22.1 │
│ 10:29:58  │ sensor_1 │ 21.8 │
│ 10:30:00  │ sensor_3 │ 22.5 │ ◄ append to end (fast)
└───────────┴──────────┴──────┘

Step 2: Update time index
┌──────────┬────────┐
│ 10:00:01 │ row 1  │
│ 10:00:05 │ row 2  │
│ 10:29:58 │ row 3  │
│ 10:30:00 │ row 4  │ ◄ find sorted position, insert (slow)
│ 10:30:02 │ row 5  │
└──────────┴────────┘

With more rows, the index grows larger and finding the correct position takes more work. Multiply this by every index on the table, and inserts get progressively slower.

Reads: No Way to Skip Irrelevant Time Ranges

Time-series queries almost always filter by time, but PostgreSQL stores all rows together with no time-based grouping. To find rows matching a time range, it must search through the entire index, even when you only need the most recent hour.

For example, if your table holds three months of sensor data and you query the last hour, PostgreSQL still searches from the beginning:

SELECT * FROM sensor_data WHERE time > '10:00:00'

┌──────────────────────────────────────────────┐
│ Jan    Feb    Mar    Apr ... Apr 25 10:00+   │
│ ◄──────── scanned ────────► ◄── needed ──►   │
└──────────────────────────────────────────────┘

Deletes: Row-by-Row Removal Locks the Table

Time-series tables grow continuously, and at some point you need to delete old data. But PostgreSQL’s DELETE operates one row at a time, removing each entry from the table, updating every index, and logging the change for crash recovery.

For example, if you have three months of sensor data and decide to keep only April, PostgreSQL processes every row in January, February, and March one at a time:

DELETE FROM sensor_data WHERE time < '2026-04-01'

┌────────┬────────┬────────┬────────┐
│  Jan   │  Feb   │  Mar   │  Apr   │
│  drop  │  drop  │  drop  │  keep  │
└────────┴────────┴────────┴────────┘
 Millions of rows, deleted one at a time

TimescaleDB is a PostgreSQL extension designed to solve all three of these problems. Let’s see how it solves each of these problems.

What Is TimescaleDB?

TimescaleDB is a PostgreSQL extension. It installs into your existing PostgreSQL instance without replacing anything. Your queries, connections, backups, and monitoring all stay the same.

What it adds is a set of features designed specifically for time-series workloads:

  • Hypertables: Automatically split large tables into smaller time-based chunks
  • time_bucket(): Group data by any time interval (15 minutes, 6 hours, 3 days)
  • Continuous aggregates: Pre-computed views that only process new data on refresh
  • Columnar compression: Reduce storage by 90-95% without changing how you query the data
  • Retention policies: Drop old data instantly without row-by-row deletion

To see how these features work in practice, let’s set up TimescaleDB and try each one.

Stay Current with CodeCut

Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.

Setup

The fastest way to start is with Docker. This runs TimescaleDB with PostgreSQL 18:

docker run -d --name timescaledb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=password \
  timescale/timescaledb-ha:pg18

Connect to the PostgreSQL instance running in the container:

psql -d "postgres://postgres:password@localhost:5432/postgres"

Verify TimescaleDB is installed:

SELECT default_version
FROM pg_available_extensions
WHERE name = 'timescaledb';
 default_version
-----------------
 2.26.3

Note: If you’re not using Docker, see Install TimescaleDB for other setup options.

Hypertables: Automatic Time-Based Partitioning

A hypertable looks and behaves like a regular PostgreSQL table. You insert, query, and join it the same way.

The difference is behind the scenes: TimescaleDB automatically splits the data into time-based “chunks” and skips irrelevant chunks during queries.

SELECT * FROM sensor_data WHERE time > 'Jan 13'

┌─────────┐  ┌─────────┐  ┌─────────┐
│ Chunk 1 │  │ Chunk 2 │  │ Chunk 3 │
│ Jan 1-7 │  │ Jan 8-14│  │ Jan 15+ │
│  skip   │  │ partial │  │  scan   │
└─────────┘  └─────────┘  └─────────┘
     ✗            ✓            ✓

Only chunks overlapping the time range are read.

Creating a Hypertable

Creating a hypertable requires only one change to a standard CREATE TABLE statement: adding WITH (tsdb.hypertable).

Let’s create one for storing sensor readings:

CREATE TABLE sensor_data (
    time TIMESTAMPTZ NOT NULL,
    sensor_id TEXT NOT NULL,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION
) WITH (tsdb.hypertable);

That’s it. TimescaleDB automatically:

  • Uses the time column to decide how to split data into chunks
  • Creates time-based chunks (default: 7-day intervals)
  • Automatically indexes the time column for fast range queries

You can customize the chunk interval and partition column if needed:

CREATE TABLE metrics (
    time TIMESTAMPTZ NOT NULL,
    device_id TEXT NOT NULL,
    cpu_usage DOUBLE PRECISION,
    memory_usage DOUBLE PRECISION
) WITH (
    tsdb.hypertable,
    tsdb.partition_column = 'time',
    tsdb.chunk_interval = '1 day'
);

Loading Data

Let’s generate a realistic dataset: 5 sensors reporting temperature and humidity every 10 seconds for 7 days. This produces about 300,000 rows.

INSERT INTO sensor_data (time, sensor_id, temperature, humidity)
SELECT
    ts,
    'sensor_' || device_num,
    20 + 10 * sin(extract(epoch FROM ts) / 43200) + random() * 2,
    40 + 20 * cos(extract(epoch FROM ts) / 43200) + random() * 5
FROM generate_series(
    '2026-04-21 00:00:00'::timestamptz,
    '2026-04-28 00:00:00'::timestamptz,
    INTERVAL '10 seconds'
) AS ts
CROSS JOIN generate_series(1, 5) AS device_num;
INSERT 0 302405

The sin() and cos() functions create realistic daily temperature and humidity cycles, with random() adding noise.

Inspecting Chunks

Query the chunk metadata to see how TimescaleDB partitioned the data:

SELECT chunk_name, range_start, range_end, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start;
    chunk_name    |      range_start       |       range_end        | is_compressed
------------------+------------------------+------------------------+---------------
 _hyper_1_1_chunk | 2026-04-16 00:00:00+00 | 2026-04-23 00:00:00+00 | f
 _hyper_1_2_chunk | 2026-04-23 00:00:00+00 | 2026-04-30 00:00:00+00 | f

Each chunk spans exactly 7 days (2026-04-16 to 2026-04-23, then 2026-04-23 to 2026-04-30). Let’s query the last 2 days and see which chunks get used:

-- First 3 rows
SELECT * FROM sensor_data
WHERE time > '2026-04-26 00:00:00'
ORDER BY time ASC LIMIT 3;

-- Last 3 rows
SELECT * FROM sensor_data
WHERE time > '2026-04-26 00:00:00'
ORDER BY time DESC LIMIT 3;
          time          | sensor_id |    temperature     |      humidity
------------------------+-----------+--------------------+--------------------
 2026-04-26 00:00:10+00 | sensor_5  | 31.067574603033847 |  32.42945016679141
 2026-04-26 00:00:10+00 | sensor_1  | 29.770821180002446 |  33.87789610251711
 2026-04-26 00:00:10+00 | sensor_2  |   30.5820979427979 | 36.640107625668165
(3 rows)

          time          | sensor_id |    temperature     |      humidity
------------------------+-----------+--------------------+--------------------
 2026-04-28 00:00:00+00 | sensor_1  |  18.82289094155276 | 61.290098283252824
 2026-04-28 00:00:00+00 | sensor_2  | 17.298999086810714 |  60.28357262461673
 2026-04-28 00:00:00+00 | sensor_3  |  18.31179563705799 | 61.943984006436224
(3 rows)

All rows fall between April 26 and April 28. To verify that only the second chunk was scanned, use EXPLAIN to explain the query plan:

EXPLAIN SELECT * FROM sensor_data
WHERE time > '2026-04-26 00:00:00';
                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using _hyper_1_2_chunk_sensor_data_time_idx on _hyper_1_2_chunk  (cost=0.42..3618.97 rows=85792 width=33)
   Index Cond: ("time" > '2026-04-26 00:00:00+00')

The plan only references _hyper_1_2_chunk (Apr 23-30). The first chunk _hyper_1_1_chunk (Apr 16-23) doesn’t appear at all, meaning TimescaleDB skipped it entirely.

time_bucket(): Flexible Time-Based Aggregations

Time-series analysis often requires grouping data into fixed time windows: average temperature every 15 minutes, total requests per 6-hour block, or peak usage per 3-day period.

PostgreSQL’s built-in date_trunc() only supports fixed calendar intervals like hour, day, or month. For custom intervals like 15 minutes, you have to manually calculate which bucket each timestamp falls into:

-- PostgreSQL workaround for 15-minute buckets
SELECT
    date_trunc('hour', time)
        + INTERVAL '15 min' * FLOOR(EXTRACT(MINUTE FROM time) / 15)
        AS bucket,
    sensor_id,
    AVG(temperature) AS avg_temp
FROM sensor_data
WHERE sensor_id = 'sensor_1'
  AND time BETWEEN '2026-04-27 00:00:00' AND '2026-04-27 02:00:00'
GROUP BY bucket, sensor_id
ORDER BY bucket;
         bucket         | sensor_id |      avg_temp
------------------------+-----------+--------------------
 2026-04-27 00:00:00+00 | sensor_1  | 13.542631344509111
 2026-04-27 00:15:00+00 | sensor_1  |  13.23800229831265
 2026-04-27 00:30:00+00 | sensor_1  |  13.20352104441286
 2026-04-27 00:45:00+00 | sensor_1  | 13.119135414712746
 2026-04-27 01:00:00+00 | sensor_1  | 12.889897733150352
 2026-04-27 01:15:00+00 | sensor_1  | 12.876690040717648
 2026-04-27 01:30:00+00 | sensor_1  |  12.72343003629354
 2026-04-27 01:45:00+00 | sensor_1  |  12.58139781130447
 2026-04-27 02:00:00+00 | sensor_1  |   11.8816092609178

TimescaleDB’s time_bucket() produces the same result with a single function call:

SELECT
    time_bucket('15 minutes', time) AS bucket,
    sensor_id,
    AVG(temperature) AS avg_temp,
    AVG(humidity) AS avg_humidity
FROM sensor_data
WHERE sensor_id = 'sensor_1'
  AND time BETWEEN '2026-04-27 00:00:00' AND '2026-04-27 02:00:00'
GROUP BY bucket, sensor_id
ORDER BY bucket;
         bucket         | sensor_id |      avg_temp      |    avg_humidity
------------------------+-----------+--------------------+--------------------
 2026-04-27 00:00:00+00 | sensor_1  | 13.542631344509111 | 29.297833301485014
 2026-04-27 00:15:00+00 | sensor_1  |  13.23800229831265 |  29.57904716854577
 2026-04-27 00:30:00+00 | sensor_1  |  13.20352104441286 |  30.02442153150598
 2026-04-27 00:45:00+00 | sensor_1  | 13.119135414712746 | 30.389413838577834
 2026-04-27 01:00:00+00 | sensor_1  | 12.889897733150352 |  30.57742656702146
 2026-04-27 01:15:00+00 | sensor_1  | 12.876690040717648 | 30.799536994969888
 2026-04-27 01:30:00+00 | sensor_1  |  12.72343003629354 | 31.520854251589334
 2026-04-27 01:45:00+00 | sensor_1  |  12.58139781130447 | 31.613018125020453
 2026-04-27 02:00:00+00 | sensor_1  |   11.8816092609178 |   33.0892996678552

Continuous Aggregates: Pre-Computed Views That Refresh Incrementally

When you run the same aggregation query repeatedly (e.g., hourly averages for a dashboard), PostgreSQL recomputes it from raw data every time. A materialized view solves this by storing the query result in a table, so future reads are instant.

The problem is that materialized views don’t update themselves. Every time new data arrives, you have to manually run REFRESH MATERIALIZED VIEW to keep the results current. This rebuilds the entire view from scratch, which can take minutes to hours on large tables.

TimescaleDB’s continuous aggregates take a smarter approach: they remember where they left off and only process new data on each refresh.

Let’s compare the two approaches with an example:

Materialized view:    recompute ALL buckets on refresh
┌──────┬──────┬──────┬──────┬──────┬──────┐
│ Jan  │ Feb  │ Mar  │ Apr  │ Apr  │ Apr  │
│ redo │ redo │ redo │ redo │ redo │ redo │
└──────┴──────┴──────┴──────┴──────┴──────┘

Continuous aggregate: only recompute CHANGED buckets
┌──────┬──────┬──────┬──────┬──────┬──────┐
│ Jan  │ Feb  │ Mar  │ Apr  │ Apr  │ Apr  │
│ skip │ skip │ skip │ skip │ skip │ redo │
└──────┴──────┴──────┴──────┴──────┴──────┘
                                     ▲
                                 new data here

With a materialized view, all six buckets are recomputed even though only the last one has new data. With a continuous aggregate, the first five are skipped entirely.

Creating a Continuous Aggregate

Let’s create a continuous aggregate that computes hourly averages, min, max, and reading count per sensor. The syntax is similar to a materialized view, with timescaledb.continuous added:

CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS hour,
    sensor_id,
    AVG(temperature) AS avg_temp,
    MIN(temperature) AS min_temp,
    MAX(temperature) AS max_temp,
    AVG(humidity) AS avg_humidity,
    COUNT(*) AS readings
FROM sensor_data
GROUP BY hour, sensor_id;

This creates the aggregate and materializes the existing data. You can query it like any view:

SELECT hour, sensor_id, avg_temp, readings
FROM sensor_hourly
WHERE sensor_id = 'sensor_1'
ORDER BY hour DESC
LIMIT 5;
          hour          | sensor_id |      avg_temp      | readings
------------------------+-----------+--------------------+----------
 2026-04-28 00:00:00+00 | sensor_1  |  18.82289094155276 |        1
 2026-04-27 23:00:00+00 | sensor_1  |  17.62569258253398 |      360
 2026-04-27 22:00:00+00 | sensor_1  | 16.916727264064487 |      360
 2026-04-27 21:00:00+00 | sensor_1  | 16.140865785926938 |      360
 2026-04-27 20:00:00+00 | sensor_1  | 15.457800483137094 |      360

Each hour has 360 readings (one every 10 seconds = 6 per minute x 60 minutes). The midnight bucket has just 1 reading because that’s the last timestamp in our dataset.

Refreshing After New Data Arrives

Now let’s insert a new reading and see how the aggregate handles it:

INSERT INTO sensor_data (time, sensor_id, temperature, humidity)
VALUES ('2026-04-28 00:10:00', 'sensor_1', 25.3, 51.2);

Then refresh only the time range that changed:

CALL refresh_continuous_aggregate('sensor_hourly', '2026-04-28 00:00:00', '2026-04-28 02:00:00');

Now query the aggregate to see the update:

SELECT hour, sensor_id, avg_temp, readings
FROM sensor_hourly
WHERE sensor_id = 'sensor_1'
ORDER BY hour DESC
LIMIT 5;
          hour          | sensor_id |      avg_temp      | readings
------------------------+-----------+--------------------+----------
 2026-04-28 00:00:00+00 | sensor_1  |  22.06144547077638 |        2
 2026-04-27 23:00:00+00 | sensor_1  |  17.62569258253398 |      360
 2026-04-27 22:00:00+00 | sensor_1  | 16.916727264064487 |      360
 2026-04-27 21:00:00+00 | sensor_1  | 16.140865785926938 |      360
 2026-04-27 20:00:00+00 | sensor_1  | 15.457800483137094 |      360

The midnight bucket went from 1 reading to 2, with its average updated to (18.82289 + 25.3) / 2 ≈ 22.06. The 20:00 through 23:00 buckets are unchanged because they fall outside the refresh window (2026-04-28 00:00:00 to 2026-04-28 02:00:00).

Adding a Refresh Policy

In practice, you wouldn’t refresh manually every time new data arrives. Instead, set up a policy that runs the refresh automatically:

SELECT add_continuous_aggregate_policy(
    'sensor_hourly',
    start_offset  => INTERVAL '3 hours',
    end_offset    => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);

This tells TimescaleDB:

  • schedule_interval: Run the refresh job every hour
  • start_offset: Look back 3 hours for any data that changed
  • end_offset: Don’t process the most recent hour (it’s likely still receiving data)

The refresh processes only the buckets that fall within the start_offset to end_offset window. On a table with months of data, this takes seconds instead of minutes:

Table with 3 months of data, policy runs at 10:00:

│ Jan │ Feb │ Mar │ ... │ 7:00 │ 8:00 │ 9:00 │10:00│
│skip │skip │skip │     │ redo │ redo │ redo │skip │
│◄──    untouched    ──►│◄──     3 hrs    ──►│     │

Monitoring Refresh Jobs

Check the status of refresh jobs:

SELECT job_id, application_name, schedule_interval, next_start
FROM timescaledb_information.jobs
WHERE application_name LIKE '%Continuous%';
 job_id |              application_name              | schedule_interval |          next_start
--------+--------------------------------------------+-------------------+-------------------------------
   1002 | Refresh Continuous Aggregate Policy [1002] | 01:00:00          | 2026-04-30 05:37:23.960309+00

This confirms the policy is active: job 1002 refreshes the sensor_hourly aggregate every hour, with the next run scheduled for the time shown in next_start.

Compression: Columnar Storage with 90%+ Reduction

As time-series tables grow, storage becomes a concern. A sensor writing every 10 seconds across 5 devices generates over 300,000 rows per week. Over months, that adds up to gigabytes of data.

The good news is that time-series data compresses far better than transactional data. In a users table, every row has a different name, email, and address. In a sensor table, the same sensor IDs appear over and over, timestamps increment by the same interval, and readings barely change from one row to the next. This repetition means the database can store the data in a fraction of the space:

10 rows of sensor data:

│ sensor_1 │ 10:00:10 │ 22.1 │
│ sensor_1 │ 10:00:20 │ 22.3 │
│ sensor_1 │ 10:00:30 │ 22.2 │
│ sensor_2 │ 10:00:10 │ 45.0 │
│ sensor_2 │ 10:00:20 │ 45.2 │
│ sensor_2 │ 10:00:30 │ 44.8 │

After compression:

  sensor_id: "sensor_1" x3, "sensor_2" x3
  time:      10:00:10, +10s, +10s, 10:00:10, +10s, +10s
  temp:      22.1, +0.2, -0.1, 45.0, +0.2, -0.4

TimescaleDB applies different compression strategies depending on the data type:

  • Timestamps: Stores the difference between consecutive values (like the +10s, +10s, +10s pattern above)
  • Repeated strings (like sensor IDs): Stores the value once and references it across all rows
  • Floating-point values: Stores the difference between consecutive readings (like the +0.2, -0.1 pattern above)

When Compression Happens

You wouldn’t want to compress data that’s still being written to. Each insert into a compressed chunk requires decompression, which is slow. TimescaleDB avoids this by default: hypertables created with WITH (tsdb.hypertable) automatically compress only chunks older than 7 days, leaving recent data uncompressed for fast writes.

┌─────────────┐ ┌─────────────┐ ┌────────────--─┐
│  Apr 7-14   │ │  Apr 14-21  │ │  Apr 21-28    │
│ compressed  │ │ compressed  │ │ uncompressed  │
│ (storage ↓) │ │ (storage ↓) │ │ (fast writes) │
└─────────────┘ └─────────────┘ └────────────-─-┘
│◄──   older than 7 days   ──►│ │◄──  recent ──►│
        auto-compressed          still receiving
                                   new inserts

To use a different threshold, remove the default policy and add a new one. For example, to compress data older than 30 days instead of 7:

-- Remove the default policy created with the hypertable
CALL remove_columnstore_policy('sensor_data');

-- Add a new policy with the desired interval
CALL add_columnstore_policy('sensor_data', after => INTERVAL '30 days');

Configuring Compression

Compression is already enabled by default when you create a hypertable. No extra setup is needed for basic compression. However, you can improve query performance on compressed data by telling TimescaleDB how to organize the rows:

CREATE TABLE sensor_data_v2 (
    time TIMESTAMPTZ NOT NULL,
    sensor_id TEXT NOT NULL,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION
) WITH (
    tsdb.hypertable,
    tsdb.segmentby = 'sensor_id',
    tsdb.orderby = 'time DESC'
);
  • segmentby = 'sensor_id': Compresses each sensor’s data into a separate block. A query for sensor_1 only reads that block, skipping all other sensors.
  • orderby = 'time DESC': Sorts each block newest first. A query for the latest reading finds it at the top without scanning the entire block.

For example, if you query WHERE sensor_id = 'sensor_1', here’s what happens with and without segmentby:

Without segmentby (one block):

┌──────────────────────────────────────────┐
│ sensor_1, sensor_2 data mixed together   │
│ query for sensor_1 decompresses ALL      │
└──────────────────────────────────────────┘

With segmentby = 'sensor_id' (one block per sensor):

┌─ sensor_1 ─────┐  ┌─ sensor_2 ─────┐
│ time DESC      │  │ time DESC      │
│ 10:00:30 22.2  │  │ 10:00:30 44.8  │
│ 10:00:20 22.3  │  │ 10:00:20 45.2  │
│ 10:00:10 22.1  │  │ 10:00:10 45.0  │
└────────────────┘  └────────────────┘
      read ✓              skip

With 100 sensors, this means a query for one sensor decompresses 1% of the data instead of all of it.

Checking Compression Results

The automatic policy runs daily, so let’s compress a chunk manually to see the results immediately:

SELECT compress_chunk('_timescaledb_internal._hyper_1_1_chunk');

Now check the chunk status:

SELECT chunk_name, range_start, range_end, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start;
    chunk_name    |      range_start       |       range_end        | is_compressed
------------------+------------------------+------------------------+---------------
 _hyper_1_1_chunk | 2026-04-16 00:00:00+00 | 2026-04-23 00:00:00+00 | t
 _hyper_1_2_chunk | 2026-04-23 00:00:00+00 | 2026-04-30 00:00:00+00 | f

is_compressed = t confirms the first chunk is now compressed.

Compression doesn’t change how you write queries. The same SQL works on both compressed and uncompressed data:

-- This queries the compressed chunk (_hyper_1_1_chunk) directly
SELECT sensor_id, AVG(temperature) AS avg_temp, COUNT(*) AS readings
FROM sensor_data
WHERE time BETWEEN '2026-04-21 00:00:00' AND '2026-04-22 00:00:00'
  AND sensor_id = 'sensor_1'
GROUP BY sensor_id;
 sensor_id |      avg_temp      | readings
-----------+--------------------+----------
 sensor_1  | 15.388563698009191 |     8641

Retention Policies: Automated Data Lifecycle Management

Time-series tables grow continuously, and at some point old data needs to go. In plain PostgreSQL, DELETE removes rows one at a time, which is slow on large tables.

TimescaleDB solves this by dropping entire chunks instead of deleting individual rows. A chunk is a separate table internally, so dropping it is instant regardless of how many rows it contains.

Retention policy: drop_after => 6 months

│  Oct   │  Nov   │  Dec   │ ... │  Mar   │  Apr   │  May   │
│  DROP  │  DROP  │  DROP  │     │  keep  │  keep  │  keep  │
│◄──── older than 6 months  ────►│◄─── within 6 months  ───►│
        dropped instantly                 untouched

To have TimescaleDB drop old chunks automatically on a schedule, add a retention policy:

SELECT add_retention_policy('sensor_data', drop_after => INTERVAL '6 months');

This runs a background job that automatically drops chunks containing data older than 6 months.

Combining Retention with Continuous Aggregates

Deleting all old data saves storage but means you can no longer answer questions like “how does this month compare to last year?” A common solution is to delete the raw readings after a short period, but keep the pre-computed summaries (hourly averages, daily totals) for much longer.

Here, raw data from sensor_data is deleted after 30 days to save storage, while the sensor_hourly continuous aggregate retains hourly averages for 1 year:

-- Keep raw data for 30 days
SELECT add_retention_policy('sensor_data', drop_after => INTERVAL '30 days');

-- Keep hourly aggregates for 1 year
SELECT add_retention_policy('sensor_hourly', drop_after => INTERVAL '1 year');

This gives you detailed data for recent debugging and long-term trends for historical analysis, without the storage cost of keeping every raw reading forever.

Key Takeaways

TimescaleDB adds time-series capabilities to PostgreSQL without requiring you to switch databases or learn a new query language.

TimescaleDB is worth considering when your PostgreSQL tables are growing with timestamped data and you’re noticing slower inserts, expensive aggregation queries, or increasing storage costs. Since it’s an extension, you can try it on a single table without changing anything else in your database.

Related Tutorials

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran