Skip to main content

Time Series Databases

A Time Series Database (TSDB) is a database specifically optimized for storing, querying, and analyzing time-stamped data points. Unlike traditional databases that treat time as just another column, TSDBs are architected around the assumption that data arrives chronologically and is primarily queried by time ranges.

What is Time Series Data?

Time series data consists of observations collected at successive points in time. Each data point typically includes:

  • Timestamp: When the measurement was taken
  • Metric/Measurement: What was measured (e.g., ohlcv, trade)
  • Value: The actual measurement (e.g., price=42150.50, volume=1.5)
  • Tags/Labels: Metadata for filtering (e.g., exchange=binance, pair=BTC-USDT)
timestamp                  | metric  | value      | tags
---------------------------|---------|------------|---------------------------
2024-01-15T10:00:00.123Z | trade | 42150.50 | exchange=binance,pair=BTC-USDT,side=buy
2024-01-15T10:00:00.124Z | trade | 42150.75 | exchange=binance,pair=BTC-USDT,side=sell
2024-01-15T10:00:00.125Z | trade | 42151.00 | exchange=coinbase,pair=BTC-USD,side=buy

Key Characteristics

FeatureDescription
Write-heavy workloadsOptimized for high ingestion rates (millions of points/sec)
Append-only writesData is rarely updated or deleted after insertion
Time-based queriesQueries almost always include time range predicates
Automatic downsamplingBuilt-in aggregation over time windows
Data retention policiesAutomatic expiration of old data
CompressionSpecialized algorithms for time series data (often 10-20x)

Advantages of TSDBs

1. Optimized Storage

TSDBs use specialized compression algorithms that exploit the nature of time series data:

  • Delta encoding for timestamps
  • Run-length encoding for repeated values
  • Gorilla compression for floating-point values

2. High Write Throughput

Designed to handle millions of data points per second without performance degradation.

3. Fast Time-Range Queries

Data is organized and indexed by time, making range queries extremely efficient:

-- Candle generation
SELECT
first(price) as open,
max(price) as high,
min(price) as low,
last(price) as close,
sum(volume) as volume
FROM trades
WHERE pair = 'ETH-USDT' AND time > now() - 24h
GROUP BY time(1h)

4. Built-in Aggregation Functions

Native support for time-based operations:

  • avg(), sum(), min(), max(), count()
  • rate(), derivative(), difference()
  • moving_average(), percentile()

5. Automatic Data Management

  • Retention policies: Automatically delete data older than X days
  • Continuous queries: Pre-compute and store aggregations
  • Downsampling: Reduce granularity of old data (1s → 1m → 1h)

When to Use a TSDB (GOOD) ✅

Use CaseExample
Trade historyEvery executed trade across all exchanges
OHLCV candles1s, 1m, 5m, 15m, 1h, 4h, 1d candlestick data
Infrastructure monitoringServer CPU, memory, disk metrics
Application Performance Monitoring (APM)Response times, error rates, throughput
IoT sensor dataTemperature, humidity, pressure readings
Financial dataStock prices, trading volumes, order flow
Real-time analyticsUser activity streams, clickstream data
DevOps observabilityLogs, traces, metrics (the "three pillars")

When NOT to Use a TSDB (BAD) ❌

ScenarioWhy?Better Alternative
Transactional dataNo ACID guarantees, no transactionsPostgreSQL, MySQL
Relational dataNo JOINs, no foreign keysTraditional RDBMS
Document storageNot designed for complex nested structuresMongoDB, Elasticsearch
Frequent updatesAppend-only architecturePostgreSQL, DynamoDB
Ad-hoc queriesOptimized for time-range, not arbitrary queriesClickHouse, BigQuery
Small datasetsOverhead not worth it for simple use casesSQLite, PostgreSQL

Open Source

DatabaseLanguageBest ForNotes
InfluxDBGoGeneral purpose, DevOpsMost popular, Flux query language
TimescaleDBC (PostgreSQL extension)SQL compatibilityFull PostgreSQL features
PrometheusGoKubernetes/cloud-native monitoringPull-based, great with Grafana
VictoriaMetricsGoPrometheus-compatible, high performanceDrop-in Prometheus replacement
QuestDBJava/C++High-performance analyticsSQL support, fast ingestion
ClickHouseC++Analytics at scaleColumn-oriented, very fast

Cloud/Managed

ServiceProviderNotes
Amazon TimestreamAWSServerless, auto-scaling
Azure Time Series InsightsAzureIoT-focused
Google Cloud BigtableGCPNot pure TSDB but works well
InfluxDB CloudInfluxDataManaged InfluxDB
Timescale CloudTimescaleManaged TimescaleDB

Architecture Comparison

Traditional RDBMS vs TSDB

┌─────────────────────────────────────────────────────────────┐
│ Traditional RDBMS │
├─────────────────────────────────────────────────────────────┤
│ • Row-oriented storage │
│ • B-tree indexes │
│ • Optimized for random reads/writes │
│ • ACID transactions │
│ • Complex JOINs supported │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Time Series DB │
├─────────────────────────────────────────────────────────────┤
│ • Column-oriented or hybrid storage │
│ • Time-partitioned indexes (LSM trees, TSI) │
│ • Optimized for sequential writes, range reads │
│ • Eventual consistency (usually) │
│ • No JOINs, denormalized data model │
└─────────────────────────────────────────────────────────────┘

Data Model Concepts

Tags vs Fields

Most TSDBs distinguish between:

  • Tags (indexed): Low-cardinality metadata for filtering
    • host, region, service, environment
  • Fields (not indexed): The actual measurements
    • cpu_percent, memory_bytes, request_latency
measurement: http_requests
tags: method=GET, status=200, endpoint=/api/users
fields: count=1523, latency_ms=45.2
timestamp: 2024-01-15T10:00:00Z

Cardinality

High cardinality (many unique tag values) is the enemy of TSDBs:

# ❌ BAD: user_id as a tag (millions of unique values)
http_requests,user_id=abc123 count=1

# ✅ GOOD: user_id as a field, aggregate by other tags
http_requests,endpoint=/api/users count=1,user_id="abc123"

Performance Tips

  1. Batch writes: Send multiple points in a single request
  2. Use appropriate precision: Don't use nanoseconds if seconds suffice
  3. Limit tag cardinality: Keep unique tag combinations under control
  4. Use retention policies: Don't store high-resolution data forever
  5. Pre-aggregate: Use continuous queries for common aggregations
  6. Shard by time: Most TSDBs do this automatically