Table of Contents
- Introduction
- Why PostgreSQL Struggles with Time-Series Data
- What Is TimescaleDB?
- Setup
- Hypertables: Automatic Time-Based Partitioning
time_bucket(): Flexible Time-Based Aggregations- Continuous Aggregates: Pre-Computed Views That Refresh Incrementally
- Compression: Columnar Storage with 90%+ Reduction
- Retention Policies: Automated Data Lifecycle Management
- Key Takeaways
Introduction
PostgreSQL is optimized for transactional workloads where individual rows are read and updated. Common operations include updating user profiles, processing payments, or changing order states. Access is random and row-focused.
PostgreSQL: designed for row-level access
┌────┐
│ r1 │ ◄── read/update
├────┤
│ r2 │
├────┤
│ r3 │ ◄── delete
├────┤
│ r4 │
├────┤
│ r5 │ ◄── read/update
└────┘
Time-series data is the opposite: you constantly append new readings, query by time range, and periodically delete old data in bulk. Instead of touching individual rows, every operation works on ranges of time.
Time-series: needs range-level access
┌────┬────┬────┬────┐
│ t1 │ t2 │ t3 │ t4 │ ◄── append
├────┼────┼────┼────┤
│ │ │ │ │
│ query range │
│ ◄──────────► │
├────┼────┼────┼────┤
│ t1 │ t2 │ │
│ ▼ │ ▼ │ │
│drop│drop│ │
└────┴────┴─────────┘
▲
└── bulk delete old chunks
PostgreSQL can handle this pattern at smaller scales, but time-series tables don’t remain small. They grow continuously, and operations that were efficient early on becomes less efficient over time.
As datasets scale into the hundreds of millions of rows, inserts, queries, and deletes all start to slow down.
Why PostgreSQL Struggles with Time-Series Data
Writes: Every Insert Updates Every Index
As a PostgreSQL table grows, inserts get progressively slower because every insert does two things:
- Append the row to the table, which stores the actual data in insertion order (fast)
- Update every index by finding the correct sorted position for the new value, so queries can locate rows without scanning the full table (gets slower as the index grows)
For example, inserting a sensor reading at 10:30:00 triggers both steps:
INSERT INTO sensor_data VALUES ('10:30:00', 'sensor_3', 22.5)
Step 1: Append row to table
┌───────────┬──────────┬──────┐
│ 10:00:01 │ sensor_1 │ 21.3 │
│ 10:00:05 │ sensor_2 │ 22.1 │
│ 10:29:58 │ sensor_1 │ 21.8 │
│ 10:30:00 │ sensor_3 │ 22.5 │ ◄ append to end (fast)
└───────────┴──────────┴──────┘
Step 2: Update time index
┌──────────┬────────┐
│ 10:00:01 │ row 1 │
│ 10:00:05 │ row 2 │
│ 10:29:58 │ row 3 │
│ 10:30:00 │ row 4 │ ◄ find sorted position, insert (slow)
│ 10:30:02 │ row 5 │
└──────────┴────────┘
With more rows, the index grows larger and finding the correct position takes more work. Multiply this by every index on the table, and inserts get progressively slower.
Reads: No Way to Skip Irrelevant Time Ranges
Time-series queries almost always filter by time, but PostgreSQL stores all rows together with no time-based grouping. To find rows matching a time range, it must search through the entire index, even when you only need the most recent hour.
For example, if your table holds three months of sensor data and you query the last hour, PostgreSQL still searches from the beginning:
SELECT * FROM sensor_data WHERE time > '10:00:00'
┌──────────────────────────────────────────────┐
│ Jan Feb Mar Apr ... Apr 25 10:00+ │
│ ◄──────── scanned ────────► ◄── needed ──► │
└──────────────────────────────────────────────┘
Deletes: Row-by-Row Removal Locks the Table
Time-series tables grow continuously, and at some point you need to delete old data. But PostgreSQL’s DELETE operates one row at a time, removing each entry from the table, updating every index, and logging the change for crash recovery.
For example, if you have three months of sensor data and decide to keep only April, PostgreSQL processes every row in January, February, and March one at a time:
DELETE FROM sensor_data WHERE time < '2026-04-01'
┌────────┬────────┬────────┬────────┐
│ Jan │ Feb │ Mar │ Apr │
│ drop │ drop │ drop │ keep │
└────────┴────────┴────────┴────────┘
Millions of rows, deleted one at a time
TimescaleDB is a PostgreSQL extension designed to solve all three of these problems. Let’s see how it solves each of these problems.
What Is TimescaleDB?
TimescaleDB is a PostgreSQL extension. It installs into your existing PostgreSQL instance without replacing anything. Your queries, connections, backups, and monitoring all stay the same.
What it adds is a set of features designed specifically for time-series workloads:
- Hypertables: Automatically split large tables into smaller time-based chunks
time_bucket(): Group data by any time interval (15 minutes, 6 hours, 3 days)- Continuous aggregates: Pre-computed views that only process new data on refresh
- Columnar compression: Reduce storage by 90-95% without changing how you query the data
- Retention policies: Drop old data instantly without row-by-row deletion
To see how these features work in practice, let’s set up TimescaleDB and try each one.
Stay Current with CodeCut
Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.
Setup
The fastest way to start is with Docker. This runs TimescaleDB with PostgreSQL 18:
docker run -d --name timescaledb \
-p 5432:5432 \
-e POSTGRES_PASSWORD=password \
timescale/timescaledb-ha:pg18
Connect to the PostgreSQL instance running in the container:
psql -d "postgres://postgres:password@localhost:5432/postgres"
Verify TimescaleDB is installed:
SELECT default_version
FROM pg_available_extensions
WHERE name = 'timescaledb';
default_version
-----------------
2.26.3
Note: If you’re not using Docker, see Install TimescaleDB for other setup options.
Hypertables: Automatic Time-Based Partitioning
A hypertable looks and behaves like a regular PostgreSQL table. You insert, query, and join it the same way.
The difference is behind the scenes: TimescaleDB automatically splits the data into time-based “chunks” and skips irrelevant chunks during queries.
SELECT * FROM sensor_data WHERE time > 'Jan 13'
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │
│ Jan 1-7 │ │ Jan 8-14│ │ Jan 15+ │
│ skip │ │ partial │ │ scan │
└─────────┘ └─────────┘ └─────────┘
✗ ✓ ✓
Only chunks overlapping the time range are read.
Creating a Hypertable
Creating a hypertable requires only one change to a standard CREATE TABLE statement: adding WITH (tsdb.hypertable).
Let’s create one for storing sensor readings:
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id TEXT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
) WITH (tsdb.hypertable);
That’s it. TimescaleDB automatically:
- Uses the
timecolumn to decide how to split data into chunks - Creates time-based chunks (default: 7-day intervals)
- Automatically indexes the
timecolumn for fast range queries
You can customize the chunk interval and partition column if needed:
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device_id TEXT NOT NULL,
cpu_usage DOUBLE PRECISION,
memory_usage DOUBLE PRECISION
) WITH (
tsdb.hypertable,
tsdb.partition_column = 'time',
tsdb.chunk_interval = '1 day'
);
Loading Data
Let’s generate a realistic dataset: 5 sensors reporting temperature and humidity every 10 seconds for 7 days. This produces about 300,000 rows.
INSERT INTO sensor_data (time, sensor_id, temperature, humidity)
SELECT
ts,
'sensor_' || device_num,
20 + 10 * sin(extract(epoch FROM ts) / 43200) + random() * 2,
40 + 20 * cos(extract(epoch FROM ts) / 43200) + random() * 5
FROM generate_series(
'2026-04-21 00:00:00'::timestamptz,
'2026-04-28 00:00:00'::timestamptz,
INTERVAL '10 seconds'
) AS ts
CROSS JOIN generate_series(1, 5) AS device_num;
INSERT 0 302405
The sin() and cos() functions create realistic daily temperature and humidity cycles, with random() adding noise.
Inspecting Chunks
Query the chunk metadata to see how TimescaleDB partitioned the data:
SELECT chunk_name, range_start, range_end, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start;
chunk_name | range_start | range_end | is_compressed
------------------+------------------------+------------------------+---------------
_hyper_1_1_chunk | 2026-04-16 00:00:00+00 | 2026-04-23 00:00:00+00 | f
_hyper_1_2_chunk | 2026-04-23 00:00:00+00 | 2026-04-30 00:00:00+00 | f
Each chunk spans exactly 7 days (2026-04-16 to 2026-04-23, then 2026-04-23 to 2026-04-30). Let’s query the last 2 days and see which chunks get used:
-- First 3 rows
SELECT * FROM sensor_data
WHERE time > '2026-04-26 00:00:00'
ORDER BY time ASC LIMIT 3;
-- Last 3 rows
SELECT * FROM sensor_data
WHERE time > '2026-04-26 00:00:00'
ORDER BY time DESC LIMIT 3;
time | sensor_id | temperature | humidity
------------------------+-----------+--------------------+--------------------
2026-04-26 00:00:10+00 | sensor_5 | 31.067574603033847 | 32.42945016679141
2026-04-26 00:00:10+00 | sensor_1 | 29.770821180002446 | 33.87789610251711
2026-04-26 00:00:10+00 | sensor_2 | 30.5820979427979 | 36.640107625668165
(3 rows)
time | sensor_id | temperature | humidity
------------------------+-----------+--------------------+--------------------
2026-04-28 00:00:00+00 | sensor_1 | 18.82289094155276 | 61.290098283252824
2026-04-28 00:00:00+00 | sensor_2 | 17.298999086810714 | 60.28357262461673
2026-04-28 00:00:00+00 | sensor_3 | 18.31179563705799 | 61.943984006436224
(3 rows)
All rows fall between April 26 and April 28. To verify that only the second chunk was scanned, use EXPLAIN to explain the query plan:
EXPLAIN SELECT * FROM sensor_data
WHERE time > '2026-04-26 00:00:00';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using _hyper_1_2_chunk_sensor_data_time_idx on _hyper_1_2_chunk (cost=0.42..3618.97 rows=85792 width=33)
Index Cond: ("time" > '2026-04-26 00:00:00+00')
The plan only references _hyper_1_2_chunk (Apr 23-30). The first chunk _hyper_1_1_chunk (Apr 16-23) doesn’t appear at all, meaning TimescaleDB skipped it entirely.
time_bucket(): Flexible Time-Based Aggregations
Time-series analysis often requires grouping data into fixed time windows: average temperature every 15 minutes, total requests per 6-hour block, or peak usage per 3-day period.
PostgreSQL’s built-in date_trunc() only supports fixed calendar intervals like hour, day, or month. For custom intervals like 15 minutes, you have to manually calculate which bucket each timestamp falls into:
-- PostgreSQL workaround for 15-minute buckets
SELECT
date_trunc('hour', time)
+ INTERVAL '15 min' * FLOOR(EXTRACT(MINUTE FROM time) / 15)
AS bucket,
sensor_id,
AVG(temperature) AS avg_temp
FROM sensor_data
WHERE sensor_id = 'sensor_1'
AND time BETWEEN '2026-04-27 00:00:00' AND '2026-04-27 02:00:00'
GROUP BY bucket, sensor_id
ORDER BY bucket;
bucket | sensor_id | avg_temp
------------------------+-----------+--------------------
2026-04-27 00:00:00+00 | sensor_1 | 13.542631344509111
2026-04-27 00:15:00+00 | sensor_1 | 13.23800229831265
2026-04-27 00:30:00+00 | sensor_1 | 13.20352104441286
2026-04-27 00:45:00+00 | sensor_1 | 13.119135414712746
2026-04-27 01:00:00+00 | sensor_1 | 12.889897733150352
2026-04-27 01:15:00+00 | sensor_1 | 12.876690040717648
2026-04-27 01:30:00+00 | sensor_1 | 12.72343003629354
2026-04-27 01:45:00+00 | sensor_1 | 12.58139781130447
2026-04-27 02:00:00+00 | sensor_1 | 11.8816092609178
TimescaleDB’s time_bucket() produces the same result with a single function call:
SELECT
time_bucket('15 minutes', time) AS bucket,
sensor_id,
AVG(temperature) AS avg_temp,
AVG(humidity) AS avg_humidity
FROM sensor_data
WHERE sensor_id = 'sensor_1'
AND time BETWEEN '2026-04-27 00:00:00' AND '2026-04-27 02:00:00'
GROUP BY bucket, sensor_id
ORDER BY bucket;
bucket | sensor_id | avg_temp | avg_humidity
------------------------+-----------+--------------------+--------------------
2026-04-27 00:00:00+00 | sensor_1 | 13.542631344509111 | 29.297833301485014
2026-04-27 00:15:00+00 | sensor_1 | 13.23800229831265 | 29.57904716854577
2026-04-27 00:30:00+00 | sensor_1 | 13.20352104441286 | 30.02442153150598
2026-04-27 00:45:00+00 | sensor_1 | 13.119135414712746 | 30.389413838577834
2026-04-27 01:00:00+00 | sensor_1 | 12.889897733150352 | 30.57742656702146
2026-04-27 01:15:00+00 | sensor_1 | 12.876690040717648 | 30.799536994969888
2026-04-27 01:30:00+00 | sensor_1 | 12.72343003629354 | 31.520854251589334
2026-04-27 01:45:00+00 | sensor_1 | 12.58139781130447 | 31.613018125020453
2026-04-27 02:00:00+00 | sensor_1 | 11.8816092609178 | 33.0892996678552
Continuous Aggregates: Pre-Computed Views That Refresh Incrementally
When you run the same aggregation query repeatedly (e.g., hourly averages for a dashboard), PostgreSQL recomputes it from raw data every time. A materialized view solves this by storing the query result in a table, so future reads are instant.
The problem is that materialized views don’t update themselves. Every time new data arrives, you have to manually run REFRESH MATERIALIZED VIEW to keep the results current. This rebuilds the entire view from scratch, which can take minutes to hours on large tables.
TimescaleDB’s continuous aggregates take a smarter approach: they remember where they left off and only process new data on each refresh.
Let’s compare the two approaches with an example:
Materialized view: recompute ALL buckets on refresh
┌──────┬──────┬──────┬──────┬──────┬──────┐
│ Jan │ Feb │ Mar │ Apr │ Apr │ Apr │
│ redo │ redo │ redo │ redo │ redo │ redo │
└──────┴──────┴──────┴──────┴──────┴──────┘
Continuous aggregate: only recompute CHANGED buckets
┌──────┬──────┬──────┬──────┬──────┬──────┐
│ Jan │ Feb │ Mar │ Apr │ Apr │ Apr │
│ skip │ skip │ skip │ skip │ skip │ redo │
└──────┴──────┴──────┴──────┴──────┴──────┘
▲
new data here
With a materialized view, all six buckets are recomputed even though only the last one has new data. With a continuous aggregate, the first five are skipped entirely.
Creating a Continuous Aggregate
Let’s create a continuous aggregate that computes hourly averages, min, max, and reading count per sensor. The syntax is similar to a materialized view, with timescaledb.continuous added:
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hour,
sensor_id,
AVG(temperature) AS avg_temp,
MIN(temperature) AS min_temp,
MAX(temperature) AS max_temp,
AVG(humidity) AS avg_humidity,
COUNT(*) AS readings
FROM sensor_data
GROUP BY hour, sensor_id;
This creates the aggregate and materializes the existing data. You can query it like any view:
SELECT hour, sensor_id, avg_temp, readings
FROM sensor_hourly
WHERE sensor_id = 'sensor_1'
ORDER BY hour DESC
LIMIT 5;
hour | sensor_id | avg_temp | readings
------------------------+-----------+--------------------+----------
2026-04-28 00:00:00+00 | sensor_1 | 18.82289094155276 | 1
2026-04-27 23:00:00+00 | sensor_1 | 17.62569258253398 | 360
2026-04-27 22:00:00+00 | sensor_1 | 16.916727264064487 | 360
2026-04-27 21:00:00+00 | sensor_1 | 16.140865785926938 | 360
2026-04-27 20:00:00+00 | sensor_1 | 15.457800483137094 | 360
Each hour has 360 readings (one every 10 seconds = 6 per minute x 60 minutes). The midnight bucket has just 1 reading because that’s the last timestamp in our dataset.
Refreshing After New Data Arrives
Now let’s insert a new reading and see how the aggregate handles it:
INSERT INTO sensor_data (time, sensor_id, temperature, humidity)
VALUES ('2026-04-28 00:10:00', 'sensor_1', 25.3, 51.2);
Then refresh only the time range that changed:
CALL refresh_continuous_aggregate('sensor_hourly', '2026-04-28 00:00:00', '2026-04-28 02:00:00');
Now query the aggregate to see the update:
SELECT hour, sensor_id, avg_temp, readings
FROM sensor_hourly
WHERE sensor_id = 'sensor_1'
ORDER BY hour DESC
LIMIT 5;
hour | sensor_id | avg_temp | readings
------------------------+-----------+--------------------+----------
2026-04-28 00:00:00+00 | sensor_1 | 22.06144547077638 | 2
2026-04-27 23:00:00+00 | sensor_1 | 17.62569258253398 | 360
2026-04-27 22:00:00+00 | sensor_1 | 16.916727264064487 | 360
2026-04-27 21:00:00+00 | sensor_1 | 16.140865785926938 | 360
2026-04-27 20:00:00+00 | sensor_1 | 15.457800483137094 | 360
The midnight bucket went from 1 reading to 2, with its average updated to (18.82289 + 25.3) / 2 ≈ 22.06. The 20:00 through 23:00 buckets are unchanged because they fall outside the refresh window (2026-04-28 00:00:00 to 2026-04-28 02:00:00).
Adding a Refresh Policy
In practice, you wouldn’t refresh manually every time new data arrives. Instead, set up a policy that runs the refresh automatically:
SELECT add_continuous_aggregate_policy(
'sensor_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
This tells TimescaleDB:
schedule_interval: Run the refresh job every hourstart_offset: Look back 3 hours for any data that changedend_offset: Don’t process the most recent hour (it’s likely still receiving data)
The refresh processes only the buckets that fall within the start_offset to end_offset window. On a table with months of data, this takes seconds instead of minutes:
Table with 3 months of data, policy runs at 10:00:
│ Jan │ Feb │ Mar │ ... │ 7:00 │ 8:00 │ 9:00 │10:00│
│skip │skip │skip │ │ redo │ redo │ redo │skip │
│◄── untouched ──►│◄── 3 hrs ──►│ │
Monitoring Refresh Jobs
Check the status of refresh jobs:
SELECT job_id, application_name, schedule_interval, next_start
FROM timescaledb_information.jobs
WHERE application_name LIKE '%Continuous%';
job_id | application_name | schedule_interval | next_start
--------+--------------------------------------------+-------------------+-------------------------------
1002 | Refresh Continuous Aggregate Policy [1002] | 01:00:00 | 2026-04-30 05:37:23.960309+00
This confirms the policy is active: job 1002 refreshes the sensor_hourly aggregate every hour, with the next run scheduled for the time shown in next_start.
Compression: Columnar Storage with 90%+ Reduction
As time-series tables grow, storage becomes a concern. A sensor writing every 10 seconds across 5 devices generates over 300,000 rows per week. Over months, that adds up to gigabytes of data.
The good news is that time-series data compresses far better than transactional data. In a users table, every row has a different name, email, and address. In a sensor table, the same sensor IDs appear over and over, timestamps increment by the same interval, and readings barely change from one row to the next. This repetition means the database can store the data in a fraction of the space:
10 rows of sensor data:
│ sensor_1 │ 10:00:10 │ 22.1 │
│ sensor_1 │ 10:00:20 │ 22.3 │
│ sensor_1 │ 10:00:30 │ 22.2 │
│ sensor_2 │ 10:00:10 │ 45.0 │
│ sensor_2 │ 10:00:20 │ 45.2 │
│ sensor_2 │ 10:00:30 │ 44.8 │
After compression:
sensor_id: "sensor_1" x3, "sensor_2" x3
time: 10:00:10, +10s, +10s, 10:00:10, +10s, +10s
temp: 22.1, +0.2, -0.1, 45.0, +0.2, -0.4
TimescaleDB applies different compression strategies depending on the data type:
- Timestamps: Stores the difference between consecutive values (like the
+10s, +10s, +10spattern above) - Repeated strings (like sensor IDs): Stores the value once and references it across all rows
- Floating-point values: Stores the difference between consecutive readings (like the
+0.2, -0.1pattern above)
When Compression Happens
You wouldn’t want to compress data that’s still being written to. Each insert into a compressed chunk requires decompression, which is slow. TimescaleDB avoids this by default: hypertables created with WITH (tsdb.hypertable) automatically compress only chunks older than 7 days, leaving recent data uncompressed for fast writes.
┌─────────────┐ ┌─────────────┐ ┌────────────--─┐
│ Apr 7-14 │ │ Apr 14-21 │ │ Apr 21-28 │
│ compressed │ │ compressed │ │ uncompressed │
│ (storage ↓) │ │ (storage ↓) │ │ (fast writes) │
└─────────────┘ └─────────────┘ └────────────-─-┘
│◄── older than 7 days ──►│ │◄── recent ──►│
auto-compressed still receiving
new inserts
To use a different threshold, remove the default policy and add a new one. For example, to compress data older than 30 days instead of 7:
-- Remove the default policy created with the hypertable
CALL remove_columnstore_policy('sensor_data');
-- Add a new policy with the desired interval
CALL add_columnstore_policy('sensor_data', after => INTERVAL '30 days');
Configuring Compression
Compression is already enabled by default when you create a hypertable. No extra setup is needed for basic compression. However, you can improve query performance on compressed data by telling TimescaleDB how to organize the rows:
CREATE TABLE sensor_data_v2 (
time TIMESTAMPTZ NOT NULL,
sensor_id TEXT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
) WITH (
tsdb.hypertable,
tsdb.segmentby = 'sensor_id',
tsdb.orderby = 'time DESC'
);
segmentby = 'sensor_id': Compresses each sensor’s data into a separate block. A query forsensor_1only reads that block, skipping all other sensors.orderby = 'time DESC': Sorts each block newest first. A query for the latest reading finds it at the top without scanning the entire block.
For example, if you query WHERE sensor_id = 'sensor_1', here’s what happens with and without segmentby:
Without segmentby (one block):
┌──────────────────────────────────────────┐
│ sensor_1, sensor_2 data mixed together │
│ query for sensor_1 decompresses ALL │
└──────────────────────────────────────────┘
With segmentby = 'sensor_id' (one block per sensor):
┌─ sensor_1 ─────┐ ┌─ sensor_2 ─────┐
│ time DESC │ │ time DESC │
│ 10:00:30 22.2 │ │ 10:00:30 44.8 │
│ 10:00:20 22.3 │ │ 10:00:20 45.2 │
│ 10:00:10 22.1 │ │ 10:00:10 45.0 │
└────────────────┘ └────────────────┘
read ✓ skip
With 100 sensors, this means a query for one sensor decompresses 1% of the data instead of all of it.
Checking Compression Results
The automatic policy runs daily, so let’s compress a chunk manually to see the results immediately:
SELECT compress_chunk('_timescaledb_internal._hyper_1_1_chunk');
Now check the chunk status:
SELECT chunk_name, range_start, range_end, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start;
chunk_name | range_start | range_end | is_compressed
------------------+------------------------+------------------------+---------------
_hyper_1_1_chunk | 2026-04-16 00:00:00+00 | 2026-04-23 00:00:00+00 | t
_hyper_1_2_chunk | 2026-04-23 00:00:00+00 | 2026-04-30 00:00:00+00 | f
is_compressed = t confirms the first chunk is now compressed.
Compression doesn’t change how you write queries. The same SQL works on both compressed and uncompressed data:
-- This queries the compressed chunk (_hyper_1_1_chunk) directly
SELECT sensor_id, AVG(temperature) AS avg_temp, COUNT(*) AS readings
FROM sensor_data
WHERE time BETWEEN '2026-04-21 00:00:00' AND '2026-04-22 00:00:00'
AND sensor_id = 'sensor_1'
GROUP BY sensor_id;
sensor_id | avg_temp | readings
-----------+--------------------+----------
sensor_1 | 15.388563698009191 | 8641
Retention Policies: Automated Data Lifecycle Management
Time-series tables grow continuously, and at some point old data needs to go. In plain PostgreSQL, DELETE removes rows one at a time, which is slow on large tables.
TimescaleDB solves this by dropping entire chunks instead of deleting individual rows. A chunk is a separate table internally, so dropping it is instant regardless of how many rows it contains.
Retention policy: drop_after => 6 months
│ Oct │ Nov │ Dec │ ... │ Mar │ Apr │ May │
│ DROP │ DROP │ DROP │ │ keep │ keep │ keep │
│◄──── older than 6 months ────►│◄─── within 6 months ───►│
dropped instantly untouched
To have TimescaleDB drop old chunks automatically on a schedule, add a retention policy:
SELECT add_retention_policy('sensor_data', drop_after => INTERVAL '6 months');
This runs a background job that automatically drops chunks containing data older than 6 months.
Combining Retention with Continuous Aggregates
Deleting all old data saves storage but means you can no longer answer questions like “how does this month compare to last year?” A common solution is to delete the raw readings after a short period, but keep the pre-computed summaries (hourly averages, daily totals) for much longer.
Here, raw data from sensor_data is deleted after 30 days to save storage, while the sensor_hourly continuous aggregate retains hourly averages for 1 year:
-- Keep raw data for 30 days
SELECT add_retention_policy('sensor_data', drop_after => INTERVAL '30 days');
-- Keep hourly aggregates for 1 year
SELECT add_retention_policy('sensor_hourly', drop_after => INTERVAL '1 year');
This gives you detailed data for recent debugging and long-term trends for historical analysis, without the storage cost of keeping every raw reading forever.
Key Takeaways
TimescaleDB adds time-series capabilities to PostgreSQL without requiring you to switch databases or learn a new query language.
TimescaleDB is worth considering when your PostgreSQL tables are growing with timestamped data and you’re noticing slower inserts, expensive aggregation queries, or increasing storage costs. Since it’s an extension, you can try it on a single table without changing anything else in your database.
Related Tutorials
- Implement Semantic Search in Postgres Using pgvector and Ollama: Another PostgreSQL extension that specializes Postgres for a workload it wasn’t originally built for, this time vector search for RAG.
- A Deep Dive into DuckDB for Data Scientists: An embedded analytical alternative for cases where you need fast aggregations on a single machine without running a database server.
- Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames: A columnar processing engine in Python, useful for analyzing data pulled out of TimescaleDB.




