This guide explains how to configure open-telemetry collector and OTLP(OpenTelemetry Protocol) configurations in the Cortex. Context The open-telemetry collector can write collected metrics to the Cortex with the Prometheus and OTLP formats. Push with Prometheus format To push metrics via the Prometheus format, we can use prometheusremotewrite exporter in the open-telemetry collector. In the exporters and service sections in the open-telemetry collector yaml file, we can add as follows: expor...| Cortex
Overview Parquet mode in Cortex provides an experimental feature that converts TSDB blocks to Parquet format for improved query performance and storage efficiency on older data. This feature is particularly beneficial for long-term storage scenarios where data is accessed less frequently but needs to be queried efficiently. The parquet mode consists of two main components: Parquet Converter: Converts TSDB blocks to Parquet format Parquet Queryable: Enables querying of Parquet files with fallb...| Cortex
Context Cortex Alertmanager notification setup follows mostly the syntax of Prometheus Alertmanager since it is based on the same codebase. The following is a description on how to load the configuration setup so that Alertmanager can use it for notification when an alert event happens. Configuring the Cortex Alertmanager storage backend With the introduction of Cortex 1.8, the storage backend config option shifted to the new pattern #3888. You can find the new configuration here Note that wh...| Cortex – Guides
All Cortex components take the tenant ID from a header X-Scope-OrgID on each request. A tenant (also called “user” or “org”) is the owner of a set of series written to and queried from Cortex. All Cortex components trust this value completely: if you need to protect your Cortex installation from accidental or malicious calls, then you must add an additional layer of protection. Typically, this means you run Cortex behind a reverse proxy, and you must ensure that all callers, both mach...| Cortex – Guides
This doc is likely out of date. It should be updated for blocks storage. You will want to estimate how many nodes are required, how many of each component to run, and how much storage space will be required. In practice, these will vary greatly depending on the metrics being sent to Cortex. Some key parameters are: The number of active series. If you have Prometheus already, you can query prometheus_tsdb_head_series to see this number. Sampling rate, e.g. a new sample for each series every mi...| Cortex – Guides
Context One option to scale the ruler is by scaling it horizontally. However, with multiple ruler instances running, they will need to coordinate to determine which instance will evaluate which rule. Similar to the ingesters, the rulers establish a hash ring to divide up the responsibilities of evaluating rules. Config In order to enable sharding in the ruler, the following flag needs to be set: -ruler.enable-sharding=true In addition, the ruler requires its own ring to be configured, for ins...| Cortex – Guides
Context You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy, and it doesn’t make sense to ingest the same data twice. So in Cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following: Assume that there are two teams, each running their own Prometheus, monitoring different services. Let’s call the Prometheus T1 and T2. Now, if the teams are r...| Cortex – Guides
Cortex supports data encryption at rest for some storage backends. S3 The Cortex S3 client supports the following server-side encryption (SSE) modes: SSE-S3 SSE-KMS Blocks storage The blocks storage S3 server-side encryption can be configured as follows. s3_sse_config The s3_sse_config configures the S3 server-side encryption. sse:# Enable AWS Server Side Encryption. Supported values: SSE-KMS, SSE-S3.# CLI flag: -<prefix>.s3.sse.type[type:<string> | default = ""]# KMS Key ID used to encrypt o...| Cortex – Guides
Cortex ingesters are semi-stateful. A running ingester holds several hours of time series data in memory before they’re flushed to the long-term storage. When an ingester shuts down because of a rolling update or maintenance, the in-memory data must not be discarded in order to avoid any data loss. The Cortex blocks storage requires ingesters to run with a persistent disk where the TSDB WAL and blocks are stored (eg. a StatefulSet when deployed on Kubernetes). During a rolling update, the l...| Cortex – Guides
This guide explains how to scale up and down ingesters. If you’re looking for how to run ingesters rolling updates, please refer to the dedicated guide._ Scaling up Adding more ingesters to a Cortex cluster is considered a safe operation. When a new ingester starts, it will register to the hash ring and the distributors will reshard received series accordingly. Ingesters that were previously receiving those series will see data stop arriving and will consider those series “idle”. If you...| Cortex – Guides
Context The Prometheus introduces a native histogram, a new sample type, to address several problems that the classic histograms (originally implemented histograms) have. Please read to Prometheus Native Histograms Document if more detailed information is necessary. This guide explains how to configure the native histograms on the Cortex. How to configure native histograms This section explains how to configure native histograms on the Cortex. Enable Ingestion To ingest native histogram inges...| Cortex – Guides
Since Cortex is a multi-tenant system, it supports applying limits to each tenant to prevent any single one from using too many resources. In order to help operators understand how close to their limits tenants are, the overrides-exporter module can expose limits as Prometheus metrics. Context To update configuration without restarting, Cortex allows operators to supply a runtime_config file that will be periodically reloaded. This file can be specified under the runtime_config section of the...| Cortex – Guides
Context Compactor is bounded by maximum 64GB of index file size. If compaction failed due to exceeding index file size limit, partition compaction can be enabled to allow compactor compacting into multiple blocks that have index file size stays within limit. Enable Partition Compaction In order to enable partition compaction, the following flag needs to be set: -compactor.sharding-enabled=true # Enable sharding tenants across multiple compactor instances. This is required to enable partition ...| Cortex – Guides
This guide explains how to configure the Ruler to evaluate rules via Query Frontends instead of the Ingester/Store Gateway, and the pros and cons of rule evaluation via Query Frontend. How to enable By default, the Ruler queries both Ingesters and Store Gateway depending on the Rule time range for evaluating rules (alerting rules or recording rules). If you have set -ruler.frontend-address, then the Ruler queries the Query Frontend for evaluation rules. The address should be the gRPC listen a...| Cortex – Guides
Cortex is a distributed system with significant traffic between its services. To allow for secure communication, Cortex supports TLS between all its components. This guide describes the process of setting up TLS. Generation of certs to configure TLS The first step to securing inter-service communication in Cortex with TLS is generating certificates. A Certifying Authority (CA) will be used for this purpose, which should be private to the organization, as any certificates signed by this CA wil...| Cortex – Guides
Cortex must be deployed with due care over system configuration, using principles such as “least privilege” to limit any exposure due to flaws in the source code. You must configure authorisation and authentication externally to Cortex; see this guide Information about security disclosures and mailing lists is in the main repo| Cortex – Guides
Cortex leverages sharding techniques to horizontally scale both single and multi-tenant clusters beyond the capacity of a single node. Background The default sharding strategy employed by Cortex distributes the workload across the entire pool of instances running a given service (eg. ingesters). For example, on the write path, each tenant’s series are sharded across all ingesters, regardless of how many active series the tenant has or how many different tenants are in the cluster. The defau...| Cortex – Guides
Cortex uses Jaeger or OpenTelemetry to implement distributed tracing. We have found tracing invaluable for troubleshooting the behavior of Cortex in production. Jaeger Dependencies In order to send traces, you will need to set up a Jaeger deployment. A deployment includes either the Jaeger all-in-one binary or else a distributed system of agents, collectors, and queriers. If running on Kubernetes, Jaeger Kubernetes is an excellent resource. Configuration In order to configure Cortex to send t...| Cortex – Guides
PromQL is powerful, and is able to result in query requests that have very wide range of data fetched and samples processed. Heavy queries can cause: CPU on any query component to be partially exhausted, increasing latency and causing incoming queries to queue up with high chance of time-out. CPU on any query component to be fully exhausted, causing GC to slow down leading to the pod being out-of-memory and killed. Heap memory on any query component to be exhausted, leading to the pod being o...| Cortex – Guides
Tenant ID naming The tenant ID (also called “user ID” or “org ID”) is the unique identifier of a tenant within a Cortex cluster. The tenant ID is opaque information to Cortex, which doesn’t make any assumptions on its format/content, but its naming has two limitations: Supported characters Length Supported characters The following character sets are generally safe for use in the tenant ID: Alphanumeric characters 0-9 a-z A-Z Special characters Exclamation point (!) Hyphen (-) Unders...| Cortex – Guides
Blocks storage The blocks storage is a Cortex storage engine based on Prometheus TSDB, which only requires an object store (eg. AWS S3, Google GCS, …) as backend storage. For more information, please refer to the Cortex blocks storage documentation. Chunk A chunk is an object containing compressed timestamp-value pairs. A single chunk contains timestamp-value pairs for several series. Churn Churn is the frequency at which series become idle. A series becomes idle once it’s not exported an...| Cortex
Cortex requires a Key-Value (KV) store to store the ring. It can use traditional KV stores like Consul or Etcd, but it can also build its own KV store on top of the memberlist library using a gossip algorithm. This short guide shows how to start Cortex in single-binary mode with a memberlist-based ring. To reduce the number of required dependencies in this guide, it will use blocks storage with no shipping to external stores. The storage engine and external storage configuration are not depen...| Cortex – Documentation
Cortex supports data replication for different services. By default, data is transparently replicated across the whole pool of service instances, regardless of whether these instances are all running within the same availability zone (or data center, or rack) or in different ones. It is completely possible that all the replicas for the given data are held within the same availability zone, even if the Cortex cluster spans multiple zones. Storing multiple replicas for a given data within the s...| Cortex
Because Cortex is designed to run multiple instances of each component (ingester, querier, etc.), you probably want to automate the placement and shepherding of these instances. Most users choose Kubernetes to do this, but this is not mandatory. Configuration Resource requests If using Kubernetes, each container should specify resource requests so that the scheduler can place them on a node with sufficient capacity. For example an ingester might request: resources: requests: cpu: 4 memory: 10...| Cortex