Time Series Databases

A Time Series Database (TSDB) is a database specifically optimized for storing, querying, and analyzing time-stamped data points. Unlike traditional databases that treat time as just another column, TSDBs are architected around the assumption that data arrives chronologically and is primarily queried by time ranges.

What is Time Series Data?

Time series data consists of observations collected at successive points in time. Each data point typically includes:

Timestamp: When the measurement was taken
Metric/Measurement: What was measured (e.g., ohlcv, trade)
Value: The actual measurement (e.g., price=42150.50, volume=1.5)
Tags/Labels: Metadata for filtering (e.g., exchange=binance, pair=BTC-USDT)

timestamp                  | metric  | value      | tags
---------------------------|---------|------------|---------------------------
2024-01-15T10:00:00.123Z   | trade   | 42150.50   | exchange=binance,pair=BTC-USDT,side=buy
2024-01-15T10:00:00.124Z   | trade   | 42150.75   | exchange=binance,pair=BTC-USDT,side=sell
2024-01-15T10:00:00.125Z   | trade   | 42151.00   | exchange=coinbase,pair=BTC-USD,side=buy

Key Characteristics

Feature	Description
Write-heavy workloads	Optimized for high ingestion rates (millions of points/sec)
Append-only writes	Data is rarely updated or deleted after insertion
Time-based queries	Queries almost always include time range predicates
Automatic downsampling	Built-in aggregation over time windows
Data retention policies	Automatic expiration of old data
Compression	Specialized algorithms for time series data (often 10-20x)

Advantages of TSDBs

1. Optimized Storage

TSDBs use specialized compression algorithms that exploit the nature of time series data:

Delta encoding for timestamps
Run-length encoding for repeated values
Gorilla compression for floating-point values

2. High Write Throughput

Designed to handle millions of data points per second without performance degradation.

3. Fast Time-Range Queries

Data is organized and indexed by time, making range queries extremely efficient:

-- Candle generation
SELECT 
  first(price) as open,
  max(price) as high,
  min(price) as low,
  last(price) as close,
  sum(volume) as volume
FROM trades
WHERE pair = 'ETH-USDT' AND time > now() - 24h
GROUP BY time(1h)

4. Built-in Aggregation Functions

Native support for time-based operations:

avg(), sum(), min(), max(), count()
rate(), derivative(), difference()
moving_average(), percentile()

5. Automatic Data Management

Retention policies: Automatically delete data older than X days
Continuous queries: Pre-compute and store aggregations
Downsampling: Reduce granularity of old data (1s → 1m → 1h)

When to Use a TSDB (GOOD) ✅

Use Case	Example
Trade history	Every executed trade across all exchanges
OHLCV candles	1s, 1m, 5m, 15m, 1h, 4h, 1d candlestick data
Infrastructure monitoring	Server CPU, memory, disk metrics
Application Performance Monitoring (APM)	Response times, error rates, throughput
IoT sensor data	Temperature, humidity, pressure readings
Financial data	Stock prices, trading volumes, order flow
Real-time analytics	User activity streams, clickstream data
DevOps observability	Logs, traces, metrics (the "three pillars")

When NOT to Use a TSDB (BAD) ❌

Scenario	Why?	Better Alternative
Transactional data	No ACID guarantees, no transactions	PostgreSQL, MySQL
Relational data	No JOINs, no foreign keys	Traditional RDBMS
Document storage	Not designed for complex nested structures	MongoDB, Elasticsearch
Frequent updates	Append-only architecture	PostgreSQL, DynamoDB
Ad-hoc queries	Optimized for time-range, not arbitrary queries	ClickHouse, BigQuery
Small datasets	Overhead not worth it for simple use cases	SQLite, PostgreSQL

Popular TSDB Options

Open Source

Database	Language	Best For	Notes
InfluxDB	Go	General purpose, DevOps	Most popular, Flux query language
TimescaleDB	C (PostgreSQL extension)	SQL compatibility	Full PostgreSQL features
Prometheus	Go	Kubernetes/cloud-native monitoring	Pull-based, great with Grafana
VictoriaMetrics	Go	Prometheus-compatible, high performance	Drop-in Prometheus replacement
QuestDB	Java/C++	High-performance analytics	SQL support, fast ingestion
ClickHouse	C++	Analytics at scale	Column-oriented, very fast

Cloud/Managed

Service	Provider	Notes
Amazon Timestream	AWS	Serverless, auto-scaling
Azure Time Series Insights	Azure	IoT-focused
Google Cloud Bigtable	GCP	Not pure TSDB but works well
InfluxDB Cloud	InfluxData	Managed InfluxDB
Timescale Cloud	Timescale	Managed TimescaleDB

Architecture Comparison

Traditional RDBMS vs TSDB

┌─────────────────────────────────────────────────────────────┐
│                    Traditional RDBMS                         │
├─────────────────────────────────────────────────────────────┤
│  • Row-oriented storage                                      │
│  • B-tree indexes                                            │
│  • Optimized for random reads/writes                         │
│  • ACID transactions                                         │
│  • Complex JOINs supported                                   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    Time Series DB                            │
├─────────────────────────────────────────────────────────────┤
│  • Column-oriented or hybrid storage                         │
│  • Time-partitioned indexes (LSM trees, TSI)                 │
│  • Optimized for sequential writes, range reads              │
│  • Eventual consistency (usually)                            │
│  • No JOINs, denormalized data model                         │
└─────────────────────────────────────────────────────────────┘

Data Model Concepts

Tags vs Fields

Most TSDBs distinguish between:

Tags (indexed): Low-cardinality metadata for filtering
- host, region, service, environment
Fields (not indexed): The actual measurements
- cpu_percent, memory_bytes, request_latency

measurement: http_requests
tags:        method=GET, status=200, endpoint=/api/users
fields:      count=1523, latency_ms=45.2
timestamp:   2024-01-15T10:00:00Z

Cardinality

High cardinality (many unique tag values) is the enemy of TSDBs:

# ❌ BAD: user_id as a tag (millions of unique values)
http_requests,user_id=abc123 count=1

# ✅ GOOD: user_id as a field, aggregate by other tags
http_requests,endpoint=/api/users count=1,user_id="abc123"

Performance Tips

Batch writes: Send multiple points in a single request
Use appropriate precision: Don't use nanoseconds if seconds suffice
Limit tag cardinality: Keep unique tag combinations under control
Use retention policies: Don't store high-resolution data forever
Pre-aggregate: Use continuous queries for common aggregations
Shard by time: Most TSDBs do this automatically

What is Time Series Data?​

Key Characteristics​

Advantages of TSDBs​

1. Optimized Storage​

2. High Write Throughput​

3. Fast Time-Range Queries​

4. Built-in Aggregation Functions​

5. Automatic Data Management​

When to Use a TSDB (GOOD) ✅​

When NOT to Use a TSDB (BAD) ❌​

Popular TSDB Options​

Open Source​

Cloud/Managed​

Architecture Comparison​

Traditional RDBMS vs TSDB​

Data Model Concepts​

Tags vs Fields​

Cardinality​

Performance Tips​