OpsBridge.Tech

Thanos

thanos

thanos

Overview

Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments.
Thanos leverages the Prometheus 2.0 storage format to cost-efficiently store historical metric data in any object storage while retaining fast query latencies. Additionally, it provides a global query view across all Prometheus installations and can merge data from Prometheus HA pairs on the fly.
Concretely the aims of the project are:

  1. Global query view of metrics.
  2. Unlimited retention of metrics.
  3. High availability of components, including Prometheus.
Features
Design

Thanos is a set of components that can be composed into a highly available Prometheus setup with long-term storage capabilities. Its main goals are operation simplicity and retaining of Prometheus’s reliability properties.

The Prometheus metric data model and the 2.0 storage format (spec, slides) are the foundational layers of all components in the system.

Components

Following the KISS and Unix philosophies, Thanos is comprised of a set of components where each fulfills a specific role.

 

Deployment with Thanos Sidecar for Kubernetes:

 

Deployment via Receive in order to scale out or integrate with other remote write-compatible sources:

1 . Sidecar

Thanos integrates with existing Prometheus servers as a sidecar process, which runs on the same machine or in the same pod as the Prometheus server.

The purpose of Thanos Sidecar is to back up Prometheus’s data into an object storage bucket, and give other Thanos components access to the Prometheus metrics via a gRPC API.

Sidecar makes use of Prometheus’s reload endpoint. Make sure it’s enabled with the flag --web.enable-lifecycle.

2. Store Gateway

As Thanos Sidecar backs up data into the object storage bucket of your choice, you can decrease Prometheus’s retention in order to store less data locally. However, we need a way to query all that historical data again. Store Gateway does just that, by implementing the same gRPC data API as Sidecar, but backing it with data it can find in your object storage bucket. Just like sidecars and query nodes, Store Gateway exposes a Store API and needs to be discovered by Thanos Querier.

3. Compactor

A local Prometheus installation periodically compacts older data to improve query efficiency. Since Sidecar backs up data into an object storage bucket as soon as possible, we need a way to apply the same process to data in the bucket.

Thanos Compactor simply scans the object storage bucket and performs compaction where required. At the same time, it is responsible for creating downsampled copies of data in order to speed up queries.

4. Receiver

The Thanos receive command implements the Prometheus Remote Write API. It builds on top of existing Prometheus TSDB and retains its usefulness while extending its functionality with long-term storage, horizontal scalability, and downsampling. Prometheus instances are configured to continuously write metrics to it, and then Thanos Receive uploads TSDB blocks to an object storage bucket every 2 hours by default. Thanos Receive exposes the StoreAPI so that Thanos Queriers can query received metrics in real time.

5. Ruler/Rule

In case Prometheus running with Thanos Sidecar does not have enough retention, or if you want to have alerts or recording rules that require a global view, Thanos has just the component for that: the Ruler, which does rule and alert evaluation on top of a given Thanos Querier.

6. Querier/Query

Now that we have setup Sidecar for one or more Prometheus instances, we want to use Thanos’s global Query Layer to evaluate PromQL queries against all instances at once.

The Querier component is stateless and horizontally scalable, and can be deployed with any number of replicas. Once connected to Thanos Sidecar, it automatically detects which Prometheus servers need to be contacted for a given PromQL query.

Thanos Querier also implements Prometheus’s official HTTP API and can thus be used with external tools such as Grafana. It also serves a derivative of Prometheus’s UI for ad-hoc querying and checking the status of the Thanos stores.

7. Query Frontend

The thanos query-frontend command implements a service that can be put in front of Thanos Queriers to improve the read path. It is based on the Cortex Query Frontend component so you can find some common features like Splitting and Results Caching.

Query Frontend is fully stateless and horizontally scalable.

When Should You Use Thanos?

Thanos is an excellent choice if you:

However, Thanos may introduce additional complexity in terms of infrastructure management, so consider your use case before deploying it.

Alternatives to Thanos

for scaling Prometheus, providing long-term storage, high availability, and global querying. Here are some of the most popular options:

1. Cortex

Best for: Multi-tenancy, cloud-native environments, and horizontal scaling.

🔗 More Info: https://cortexmetrics.io/

2. Mimir (Grafana Mimir)

Best for: Organizations using Grafana for visualization and needing a scalable Prometheus backend.

🔗 More Info: https://grafana.com/oss/mimir/

3. VictoriaMetrics

Best for: High-performance storage and efficient resource utilization.

🔗 More Info: https://victoriametrics.com/

4. OpenTelemetry (OTel) with Prometheus Exporter

Best for: Organizations already adopting OpenTelemetry for observability.

🔗 More Info: https://opentelemetry.io/

Choosing the Right Alternative
Feature Thanos Cortex Mimir VictoriaMetrics OpenTelemetry
Global Querying ⚠️ Limited
Long-Term Storage
Multi-Tenancy ⚠️ Limited ⚠️ Limited
High Availability
Ease of Deployment ⚠️ Complex ⚠️ Complex ⚠️ Complex ✅ Easy ⚠️ Complex
Cost-Effectiveness ⚠️ Can be expensive ✅ Very efficient ⚠️ Depends on backend

🚀 How OpsBridge Can Help

At OpsBridge, we specialize in designing and implementing scalable monitoring solutions using Prometheus and Thanos. Whether you need help with deploying Thanos, optimizing your Prometheus setup, or managing long-term storage efficiently, our DevOps experts can provide the right strategy and hands-on support.

Our services include:

✅ Setting up and managing Thanos and Prometheus for high availability.

✅ Optimizing storage and query performance for cost efficiency.

✅ Implementing alerting and monitoring best practices to improve system reliability.

✅ Providing custom solutions tailored to your infrastructure needs.

👉 If you’re looking for expert guidance on scaling your monitoring stack, contact us today!

 

Conclusion

Thanos is a powerful solution for scaling Prometheus, providing high availability, global querying, and long-term storage for metrics. By leveraging Thanos, DevOps and SRE teams can ensure reliability and observability across large-scale deployments without losing valuable monitoring data.

If you’re looking to enhance your monitoring setup, integrating Thanos with Prometheus is a great step forward. Have experience with Thanos? Share your thoughts with us!

 

 

Used By

 

Source: Thanos

Exit mobile version