Time Series Databases
A Time Series Database (TSDB) is a database specifically optimized for storing, querying, and analyzing time-stamped data points. Unlike traditional databases that treat time as just another column, TSDBs are architected around the assumption that data arrives chronologically and is primarily queried by time ranges.
What is Time Series Data?
Time series data consists of observations collected at successive points in time. Each data point typically includes:
- Timestamp: When the measurement was taken
- Metric/Measurement: What was measured (e.g.,
ohlcv,trade) - Value: The actual measurement (e.g.,
price=42150.50,volume=1.5) - Tags/Labels: Metadata for filtering (e.g.,
exchange=binance,pair=BTC-USDT)
timestamp | metric | value | tags
---------------------------|---------|------------|---------------------------
2024-01-15T10:00:00.123Z | trade | 42150.50 | exchange=binance,pair=BTC-USDT,side=buy
2024-01-15T10:00:00.124Z | trade | 42150.75 | exchange=binance,pair=BTC-USDT,side=sell
2024-01-15T10:00:00.125Z | trade | 42151.00 | exchange=coinbase,pair=BTC-USD,side=buy
Key Characteristics
| Feature | Description |
|---|---|
| Write-heavy workloads | Optimized for high ingestion rates (millions of points/sec) |
| Append-only writes | Data is rarely updated or deleted after insertion |
| Time-based queries | Queries almost always include time range predicates |
| Automatic downsampling | Built-in aggregation over time windows |
| Data retention policies | Automatic expiration of old data |
| Compression | Specialized algorithms for time series data (often 10-20x) |
Advantages of TSDBs
1. Optimized Storage
TSDBs use specialized compression algorithms that exploit the nature of time series data:
- Delta encoding for timestamps
- Run-length encoding for repeated values
- Gorilla compression for floating-point values
2. High Write Throughput
Designed to handle millions of data points per second without performance degradation.
3. Fast Time-Range Queries
Data is organized and indexed by time, making range queries extremely efficient:
-- Candle generation
SELECT
first(price) as open,
max(price) as high,
min(price) as low,
last(price) as close,
sum(volume) as volume
FROM trades
WHERE pair = 'ETH-USDT' AND time > now() - 24h
GROUP BY time(1h)
4. Built-in Aggregation Functions
Native support for time-based operations:
avg(),sum(),min(),max(),count()rate(),derivative(),difference()moving_average(),percentile()
5. Automatic Data Management
- Retention policies: Automatically delete data older than X days
- Continuous queries: Pre-compute and store aggregations
- Downsampling: Reduce granularity of old data (1s → 1m → 1h)
When to Use a TSDB (GOOD) ✅
| Use Case | Example |
|---|---|
| Trade history | Every executed trade across all exchanges |
| OHLCV candles | 1s, 1m, 5m, 15m, 1h, 4h, 1d candlestick data |
| Infrastructure monitoring | Server CPU, memory, disk metrics |
| Application Performance Monitoring (APM) | Response times, error rates, throughput |
| IoT sensor data | Temperature, humidity, pressure readings |
| Financial data | Stock prices, trading volumes, order flow |
| Real-time analytics | User activity streams, clickstream data |
| DevOps observability | Logs, traces, metrics (the "three pillars") |
When NOT to Use a TSDB (BAD) ❌
| Scenario | Why? | Better Alternative |
|---|---|---|
| Transactional data | No ACID guarantees, no transactions | PostgreSQL, MySQL |
| Relational data | No JOINs, no foreign keys | Traditional RDBMS |
| Document storage | Not designed for complex nested structures | MongoDB, Elasticsearch |
| Frequent updates | Append-only architecture | PostgreSQL, DynamoDB |
| Ad-hoc queries | Optimized for time-range, not arbitrary queries | ClickHouse, BigQuery |
| Small datasets | Overhead not worth it for simple use cases | SQLite, PostgreSQL |
Popular TSDB Options
Open Source
| Database | Language | Best For | Notes |
|---|---|---|---|
| InfluxDB | Go | General purpose, DevOps | Most popular, Flux query language |
| TimescaleDB | C (PostgreSQL extension) | SQL compatibility | Full PostgreSQL features |
| Prometheus | Go | Kubernetes/cloud-native monitoring | Pull-based, great with Grafana |
| VictoriaMetrics | Go | Prometheus-compatible, high performance | Drop-in Prometheus replacement |
| QuestDB | Java/C++ | High-performance analytics | SQL support, fast ingestion |
| ClickHouse | C++ | Analytics at scale | Column-oriented, very fast |
Cloud/Managed
| Service | Provider | Notes |
|---|---|---|
| Amazon Timestream | AWS | Serverless, auto-scaling |
| Azure Time Series Insights | Azure | IoT-focused |
| Google Cloud Bigtable | GCP | Not pure TSDB but works well |
| InfluxDB Cloud | InfluxData | Managed InfluxDB |
| Timescale Cloud | Timescale | Managed TimescaleDB |
Architecture Comparison
Traditional RDBMS vs TSDB
┌─────────────────────────────────────────────────────────────┐
│ Traditional RDBMS │
├─────────────────────────────────────────────────────────────┤
│ • Row-oriented storage │
│ • B-tree indexes │
│ • Optimized for random reads/writes │
│ • ACID transactions │
│ • Complex JOINs supported │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Time Series DB │
├─────────────────────────────────────────────────────────────┤
│ • Column-oriented or hybrid storage │
│ • Time-partitioned indexes (LSM trees, TSI) │
│ • Optimized for sequential writes, range reads │
│ • Eventual consistency (usually) │
│ • No JOINs, denormalized data model │
└─────────────────────────────────────────────────────────────┘
Data Model Concepts
Tags vs Fields
Most TSDBs distinguish between:
- Tags (indexed): Low-cardinality metadata for filtering
host,region,service,environment
- Fields (not indexed): The actual measurements
cpu_percent,memory_bytes,request_latency
measurement: http_requests
tags: method=GET, status=200, endpoint=/api/users
fields: count=1523, latency_ms=45.2
timestamp: 2024-01-15T10:00:00Z
Cardinality
High cardinality (many unique tag values) is the enemy of TSDBs:
# ❌ BAD: user_id as a tag (millions of unique values)
http_requests,user_id=abc123 count=1
# ✅ GOOD: user_id as a field, aggregate by other tags
http_requests,endpoint=/api/users count=1,user_id="abc123"
Performance Tips
- Batch writes: Send multiple points in a single request
- Use appropriate precision: Don't use nanoseconds if seconds suffice
- Limit tag cardinality: Keep unique tag combinations under control
- Use retention policies: Don't store high-resolution data forever
- Pre-aggregate: Use continuous queries for common aggregations
- Shard by time: Most TSDBs do this automatically