📈 A High Performance Time-Series Storage Solution

If you have a requirement to consume high frequency time-series data in your application, there are some excellent proprietary time-series database offerings that provide great features and performance. For many reasons however, using a proprietary system isn't always the most appropriate option - in these cases you might consider building your own storage and retrieval solution.

This post details an efficient storage solution for ingesting and querying high-frequency time-series data. The example uses Service Fabric Actors and Azure CosmosDB, however the concept can easily be adapted to your own specific platform and storage system. Service Fabric Actors provide a turn-based access model, this means that actor state can be used as our cache without having to worry about locking to prevent concurrent writes, even across multiple VMs.

In summary, the solution is a write-back caching strategy that pre-aggregates and compresses time-series into fixed intervals. This reduces the volume of historic records, making historic queries much faster and storage cheaper, while allowing a longer recent history of data to remain in cache for fast access. This strategy is inspired by time-series databases and can easily be implemented using existing resources.

The solution is geared towards high-frequency telemetry, where the number of writes is high, and the reads are most commonly performed against recent history. It's also adapted to suit append-heavy workloads, and while it supports updating older records, it may not be the right solution if updates are frequent.

Aggregation

Querying large volumes of high-frequency time-series data can be optimised by pre-aggregating data into larger intervals. For example, querying against a few months worth of 1 minute interval data (at 1440 samples per day) will be less efficient than querying a few months worth of daily data.

The aggregation interval can be adjusted to suit your application, and should be small enough to reduce the number of updates on existing data, but large enough to significantly reduce the number of records being queried.

Compression

The compression algorithm used in this example was developed by Facebook for their Gorilla time-series database. It takes advantage of the inherent repetitive properties of time-series data, storing the delta of deltas for the timestamp and storing only 'meaningful' bits for values, omitting leading & trailing zeros. For regular-interval sampling, timestamps are reduced to single control bits that specify the interval is the same as the previous.

This algorithm is very fast and has a high compression ratio for numeric time-series data, however the approach outlined in this post can be implemented using any time-series compression algorithm.

There are many open-source implementations of gorilla compression, this .NET Core implementation of Gorilla compression has been tried and tested.

Write-Back Caching

Write-back cache describes a system that immediately caches new data and only moves to long-term storage after a period of time. This works well for time-series data, as it's common for recent data to be accessed more frequently than historic data. Requests can be served using a combination of the cache and database, concatenating the data before sending a response.

Write-back caching works well with Gorilla compression, as we can quickly append new values to the latest period of compressed data. The decreased size of the compressed time-series allows more of it to be stored in cache, meaning we can keep a longer period of recent history, leading to more cache hits.

An overview of the caching strategy. Compression is immediately performed on appended data, aggregated into fixed intervals and after a time, moved out of the cache and into a database for historic querying.

A cache-miss would mean that the remaining data is queried from the database. This will not be as fast as serving data from the cache, however the aggregation of data into longer intervals means that database queries will run much faster for high-frequency time-series.

Service Fabric Actor & CosmosDB Implementation - Part 1

Service Fabric Actors keep state and logic closely coupled, this makes for a great in-memory cache. Compression reduces the memory footprint of cached data, and the time period that is cached can be shortened if required. This solution maps each distinct series to a single actor instance.

CosmosDB is a good database choice for scale-able volumes of data as it automatically partitions collections as the data volume grows, dependent on choosing an appropriate partition key. For this solution, a hash of series id and year is a good partitioning scheme, as we'll be querying individual series from within series-specific actors, the addition of year to the partition key means that partitions will not indefinitely grow as time rolls on.

Thanks for reading, Part 2 will dive deeper into the implementation detail, including how updates are handled. Keep posted!