2025-01-27
24 min read

Top 10 OpenTelemetry Collector Components

Understanding and managing the performance of your applications can be a significant challenge – but it doesn’t have to be. This is where OpenTelemetry comes in, offering a powerful framework for collecting and exporting telemetry data (traces, metrics, and logs) from your applications. One core capability of OpenTelemetry is the OpenTelemetry Collector, a versatile and highly configurable service designed to streamline the process of telemetry data collection, processing, and exporting. Sometimes also called an observability or telemetry pipeline.

Introduction

The Collector acts as an intermediary between your application and various backend analysis tools. Instead of having your application directly send data to multiple destinations, the Collector handles this complexity for you. This offers several advantages, including reduced resource consumption in your application, centralized configuration, and improved data security.

Components of the OpenTelemetry Collector

The Collector's functionality is built upon five key component types:

Receivers: These are the entry points for telemetry data. They act as listeners and scrapers, accepting data in various formats (e.g., OTLP and Prometheus) from your application.
Processors: Once data is received, processors manipulate it before it's exported. Common uses include filtering, enriching data with metadata, and data sampling to reduce volume.
Exporters: Exporters send the processed telemetry data to various backend systems (e.g., Dash0, Jaeger and Grafana) for storage, visualization, and analysis.
Connectors: Connectors facilitate communication between pipelines and transformation from one signal type to another, enabling complex data processing pipelines and flexible routing of telemetry data.
Extensions: Extensions can enrich the capabilities of the other components or implement cross-cutting concerns such as performance analysis (via PProf) and authentication.

By combining these components in various configurations, you can create a robust and customized telemetry pipeline tailored to your specific needs. In this article, we'll explore the top 10 OpenTelemetry Collector components that every user should know.

A telemetry pipeline visualized using OTelBin.io – a free visualization solution for OpenTelemetry Collector pipelines.

A good starting point is always the Contrib distribution, which contains, among other things, every component mentioned in this article. When in doubt about what's supported by what distribution, we recommend checking the manifest.yaml files in the release repository, or using OTelBin's distribution validation capability. The latter spawns a collector instance with your configuration file to see whether there are any validation issues.

1: OTLP Receiver & Exporter

The OpenTelemetry Protocol (OTLP) is the lingua franca of the OpenTelemetry project, a standardized format for encoding and transmitting telemetry data. This is where the OTLP Receiver and Exporter shine, forming a powerful duo that simplifies and streamlines your telemetry pipeline.

Technically, they are two separate components, but let’s be honest. You always need both!

OTLP Receiver: The Universal Ingestion Point

The OTLP Receiver acts as the primary gateway for telemetry data into your OpenTelemetry Collector. It's designed to accept data in the OTLP format, regardless of the source. This means whether your applications are instrumented with Java, Python, Go, or any other OpenTelemetry SDK, the OTLP Receiver can seamlessly ingest traces, metrics, and logs.

Key Advantages

Standardization: By relying on OTLP, you avoid vendor lock-in and ensure compatibility across a wide range of tools and systems.
Flexibility: The receiver supports both gRPC and HTTP protocols, providing options for different deployment scenarios and network configurations.
Efficiency: OTLP's compact binary encoding minimizes network overhead, making it suitable for high-volume data streams.

OTLP Exporter: Your Gateway to Observability Backends

On the other end of the pipeline, the OTLP Exporter takes the processed telemetry data and forwards it to backend systems that support the OTLP format – like Dash0!

Note that there are two OTLP exporters. The OTLP exporter transmits data via gRPC, whereas the OTLP HTTP exporter transmits data via HTTP.

Key Advantages

Future-Proofing: As OTLP gains wider adoption, the exporter ensures your telemetry pipeline remains compatible with emerging analysis tools and platforms.
Simplified Integration: Connecting to OTLP-compliant backends is straightforward, reducing configuration complexity.
Data Integrity: OTLP preserves the semantic meaning of your telemetry data, ensuring accurate analysis and insights.

Configuration Example

yaml

config.yaml1234578910111314151617181920212223
receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  otlp:
    endpoint: otelcol:4317
  otlphttp:
    endpoint: https://otelcol:4318
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp]
    metrics:
      receivers: [otlp]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      exporters: [otlp]

2: Batch Processor

While the OTLP Receiver and Exporter handle the entry and exit of telemetry data, the Batch Processor optimizes the flow in between. In the world of high-volume data streams, sending individual data points one by one can create significant overhead. The Batch Processor addresses this by grouping data into batches before forwarding them to the exporter.

The Batch Processor accumulates incoming spans, metrics, or logs until one of the following conditions is met:

Maximum Batch Size: A predefined number of items have been collected.
Timeout: A specified time interval has elapsed.

Once a batch is complete, it's sent to the next component in the pipeline, typically an exporter. This significantly reduces the number of individual transmissions, leading to improved performance and reduced resource consumption.

Placement in the Pipeline

For optimal results, the Batch Processor should be placed after any processors that might drop data, such as sampling or filtering processors. This ensures that batching occurs on the final set of data being exported.

By strategically incorporating the Batch Processor into your OpenTelemetry Collector pipeline, you can significantly enhance the efficiency of your telemetry data processing, ensuring smooth and reliable data delivery to your observability backend.

Key Advantages

Reduced Overhead: Minimizes the number of network calls and API requests, easing the burden on both the Collector and the backend system.
Improved Throughput: Enables the Collector to handle larger volumes of data more efficiently.
Better Resource Utilization: Conserves network bandwidth and processing power, especially beneficial in resource-constrained environments.

Configuration Example

yaml

config.yaml1234
processors:
  batch:
    send_batch_size: 10000
    timeout: 2s

3. Filelog Receiver

While OpenTelemetry is often associated with traces and metrics, logs remain a crucial source of information for understanding application behavior and troubleshooting issues. The Filelog Receiver brings this valuable data into your OpenTelemetry ecosystem by collecting logs directly from files. There is also the JournalD receiver that we have written about in the past.

The Filelog Receiver tails log files, meaning it continuously monitors them for new entries and captures those entries as they are written. This allows you to ingest log data in real-time, providing insights into your application's activity.

Key Advantages

Versatile Log Collection: Gathers logs from various sources, including application logs, system logs, and container logs.
Flexible Configuration: Allows you to specify the files or directories to monitor, include or exclude files based on patterns, and define how frequently to check for new log entries.
Structured Log Support: Can parse structured log formats like JSON, enabling easier querying and analysis of log data.
Parsing: The filelog receiver can parse various log formats, such as CRI-O, ContainerD, and Docker, with the support of operators. And through this also derive Kubernetes resource attributes!
Multi-line Log Handling: Correctly handles logs that span multiple lines, ensuring that complete log entries are captured.

Configuration Example

Configuring the Filelog Receiver involves specifying the include paths (files or directories to monitor) and any exclusion patterns. You can also define the file rotation strategy and how the receiver should handle log files that are renamed or moved.

yaml

config.yaml1234
receivers:
  filelog:
    include:
    - /var/log/example/my.log

4: The Prometheus Receiver

Prometheus has become a de facto standard for monitoring, especially in Kubernetes environments. Its pull-based model and powerful querying language make it a popular choice for collecting and analyzing metrics. The Prometheus Receiver acts as a bridge between the OpenTelemetry Collector and your existing Prometheus infrastructure.

The Prometheus Receiver essentially functions as a Prometheus scraper within the Collector. So it’s not really a “receiver” in the sense that it actively goes and gets data, and this technicality tends to get people confused. The Prometheus receiver uses the same configuration format as Prometheus, allowing you to define scrape targets and specify how metrics should be collected. This means you can leverage your existing Prometheus configurations without significant changes.

Key Advantages

Compatibility: Supports the majority (see the details here) of Prometheus configuration options, including service discovery, relabeling, and metric filtering.
Efficiency: Leverages Prometheus's efficient scraping mechanisms for collecting metrics with minimal overhead.
Transformation: Converts scraped Prometheus metrics into the OpenTelemetry metric format, enabling seamless integration with other OpenTelemetry components and backends.
Exemplars Support: Can collect exemplars, which are specific data points associated with metrics, providing richer context for analysis (available in OpenMetrics format).

Configuration Example

The Prometheus Receiver utilizes the familiar scrape_configs structure from Prometheus. This allows you to define scrape jobs, specify targets, and configure scraping intervals and timeouts.

yaml

config.yaml12345678910111213141516171819
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['0.0.0.0:8888']
        - job_name: k8s
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              regex: "true"
              action: keep
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: "(request_duration_seconds.*|response_duration_seconds.*)"
              action: keep

5: Kubeletstats Receiver

While the Prometheus Receiver can scrape metrics from various sources, including Kubernetes, the Kubeletstats Receiver offers a more specialized approach to gathering Kubernetes-specific metrics. It taps directly into the kubelet, the agent running on each node in your cluster, providing granular insights into node and pod performance.

The Kubeletstats Receiver connects to the kubelet's API server on each node to collect a wide range of metrics. These include:

Node Metrics: CPU usage, memory usage, disk I/O, network traffic, and resource limits.
Pod Metrics: CPU usage, memory usage, network traffic, and restart counts for each pod.
Container Metrics: CPU usage, memory usage, and network traffic for each container within a pod.
Volume Metrics: Available bytes, capacity bytes, and inodes for persistent volumes.

Key Advantages

Comprehensive Kubernetes Monitoring: Provides a detailed view of resource utilization and performance across your Kubernetes cluster.
Direct Kubelet Access: Gathers metrics directly from the source, ensuring accuracy and minimizing latency.
Resource Efficiency: Optimized for collecting Kubernetes metrics, reducing overhead compared to generic scraping approaches.
Rich Metadata: Enriches metrics with Kubernetes metadata (node name, pod name, namespace, etc.), enabling powerful filtering and analysis.

Configuration Example

Configuring the Kubeletstats Receiver involves specifying the collection interval and enabling or disabling specific metric groups (container, pod, node, volume). You can also configure authentication options for secure communication with the kubelet.

yaml

config.yaml123
receivers:
  kubeletstats:
    collection_interval: 10s

This configuration instructs the receiver to collect metrics every 10 seconds.

By utilizing the Kubeletstats Receiver, you gain in-depth visibility into the performance of your Kubernetes cluster, empowering you to optimize resource utilization, troubleshoot issues effectively, and ensure the smooth operation of your applications.

6: K8sattributes Processor

While the Kubeletstats Receiver focuses on collecting Kubernetes-specific metrics, the K8sattributes Processor enhances your existing telemetry data by adding valuable Kubernetes context. This processor automatically associates your traces, metrics, and logs with relevant Kubernetes metadata, such as pod name, namespace, deployment, and node. The information is added to the OpenTelemetry resources.

As a Kubernetes user, you will want this receiver to clearly understand what Kubernetes pod, ReplicaSet, and Deployment in which namespace is responsible for a piece of telemetry. It's beautiful!

The K8sattributes Processor leverages the Kubernetes API to gather metadata about the pods running in your cluster. It then uses this information to enrich your telemetry data by adding attributes that indicate the Kubernetes resources associated with each data point. This enrichment process is crucial for understanding the relationship between your application's performance and its underlying Kubernetes environment.

Key Advantages

Enhanced Context: Provides valuable context for your telemetry data, making it easier to understand where and how your application is running within the Kubernetes cluster.
Improved Troubleshooting: Allows you to quickly identify the pods, deployments, and nodes associated with performance issues or errors.
Powerful Filtering and Grouping: Enables you to filter and group your telemetry data based on Kubernetes attributes, facilitating targeted analysis and investigation.
Simplified Correlation: Helps you correlate telemetry data from different sources (traces, metrics, logs) based on their shared Kubernetes context.

Configuration Example

The processor offers various configuration options, including:

Authentication: Specify how to authenticate with the Kubernetes API.
Filtering: Include or exclude specific Kubernetes metadata.
Passthrough: Control whether to pass through telemetry data that cannot be associated with Kubernetes metadata.

yaml

config.yaml1234
processors:
  k8sattributes:
    auth_type: serviceAccount
    passthrough: false

This configuration instructs the processor to use the Collector's service account to authenticate with the Kubernetes API and to drop any telemetry data that cannot be associated with Kubernetes metadata.

By incorporating the k8sattributes processor into your OpenTelemetry Collector pipeline, you significantly enhance the value of your telemetry data by adding crucial Kubernetes context. This enables more effective troubleshooting, analysis, and optimization of your applications running in a Kubernetes environment.

7: Attributes and ResourceAttributes Processor

While the K8sattributes Processor adds Kubernetes-specific context, the Attributes Processor gives you granular control over the attributes attached to your telemetry data. This powerful tool allows you to modify, add, or remove attributes from spans, metrics, and logs, enabling you to customize your telemetry to meet your specific needs.

Note that there is also a ResourceAttributes Processor that comes in handy to modify resource attributes.

Key Advantages

Data Customization: Tailor your telemetry data to include the specific information you need for analysis and troubleshooting.
Data Enrichment: Add context and meaning to your data by incorporating attributes like environment, version, or user information.
Data Masking: Protect sensitive data by hashing or removing attributes that contain confidential information.
Data Standardization: Ensure consistency in your telemetry data by enforcing naming conventions and data types.

Configuration Example

The Attributes Processor is configured with a list of actions, each specifying the action to perform, the target attribute key, and the desired value or transformation.

yaml

config.yaml12345678
processors:
  attributes:
    actions:
      - key: environment
        value: production
        action: insert
      - key: enduser.name
        action: hash

This configuration adds an environment attribute with the value "production" and hashes the enduser.name attribute.

By mastering the Attributes Processor, you gain fine-grained control over your telemetry data, enabling you to customize it to meet your specific needs and extract the maximum value from your observability efforts.

8: Filter Processor

In today's complex systems, the sheer volume of telemetry data can be overwhelming. The Filter Processor helps you drop spans, metrics and logs selectively.

Key Advantages

Reduced Data Volume: Focus on the most important data by discarding irrelevant or redundant information. For example, by dropping metrics that are known to be noisy or irrelevant to you.
Cost Optimization: Lower data storage and processing costs by filtering out unnecessary data.
Improved Signal-to-Noise Ratio: Enhance the clarity of your observability data by reducing noise and highlighting critical signals.
Targeted Analysis: Isolate specific data for focused analysis and troubleshooting.

Configuration Example

The Filter Processor is configured with a set of rules defined using the OpenTelemetry Transformation Language (OTTL). These rules specify the conditions under which data should be included or excluded.

yaml

config.yaml12345678910
processors:
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^http.*"]
    spans:
      exclude:
        match_type: strict
        span_names: ["/ping"]

This configuration includes only metrics with names starting with http. and excludes spans with the name /ping.

9: Tail Sampling Processor

While the Filter Processor allows you to discard telemetry data based on predefined rules, the Tail Sampling Processor offers a more dynamic approach specifically for tracing data. It employs a "tail-based" sampling strategy, meaning it makes sampling decisions after the entire trace has been received.

The Tail Sampling Processor analyzes complete traces and applies sampling policies to determine which traces should be kept or dropped. This allows you to prioritize traces that exhibit specific characteristics, such as:

High Latency: Capture traces that exceed a defined duration threshold, helping you identify performance bottlenecks.
Errors: Retain traces that contain errors or exceptions, enabling you to focus on troubleshooting and root cause analysis.
Specific Attributes: Sample traces based on the presence or value of specific attributes, allowing you to target traces related to certain users, services, or operations.
Rate Limiting: Sample traces based on a defined rate, ensuring that you capture a representative sample of your application's behavior while controlling data volume.

Key Advantages

Intelligent Prioritization: Focus on the most important traces by sampling based on their content and characteristics.
Reduced Data Volume: Control the amount of data being exported while retaining crucial information.
Flexibility: Define custom sampling policies to meet your specific needs and observability goals.

Configuration Example

By leveraging the Tail Sampling Processor, you can implement intelligent sampling strategies that prioritize the most relevant traces, ensuring that you capture the crucial information needed for effective observability while managing data volume and costs. However, this comes with complexity and overhead. We recommend only investing into it when you have a concrete need for it.

The configuration is pretty advanced; a small example wouldn’t do it the justice it deserves. Consequently, we recommend that you check out its official documentation instead, which also includes common use case examples.

10: Transform Processor

The Transform Processor takes data manipulation to the next level, providing a powerful engine for modifying your telemetry data using the OpenTelemetry Transformation Language (OTTL). This processor enables you to perform complex transformations on spans, metrics, and logs, going beyond the capabilities of the Attributes Processor.

The Transform Processor allows you to:

Apply Conditional Logic: Modify telemetry data based on conditions evaluated using OTTL. This enables you to perform different transformations depending on the characteristics of the data.
Access and Modify Nested Fields: Traverse and modify nested data structures within your telemetry data, providing granular control over individual fields and attributes.
Perform Calculations: Execute mathematical operations on numeric values within your telemetry data, enabling you to derive new metrics or adjust existing values.
Aggregate Data: Combine multiple data points into a single value, useful for summarizing information or creating new metrics.
String Manipulation: Perform string operations like concatenation, substring extraction, and regular expression matching.

Key Advantages

Advanced Data Manipulation: Execute complex transformations on your telemetry data to meet specific needs and analysis requirements.
Flexibility: Leverage the full power of OTTL to define custom transformations and data manipulations.
Data Enrichment: Derive new insights and add value to your telemetry data by performing calculations and aggregations.
Data Standardization: Enforce consistent data formats and naming conventions across your telemetry data.

Configuration Example

The Transform Processor is configured with a set of OTTL statements that define the transformations to be applied. These statements can include conditions, functions, and expressions for manipulating the telemetry data.

yaml

config.yaml1234567
processors:
  transform:
    trace_statements:
    - context: resource
      statements:
        - set(attributes["k8s.namespace.name"], attributes["namespace"])
        - delete_key(attributes, "namespace")

This configuration adds a new http.route attribute to spans by concatenating the / character with the value of the http.path attribute. This barely scratches the surface of what this processor can do. Check out its documentation to learn a lot more.

Conclusion

The OpenTelemetry Collector is a powerful and versatile tool that empowers you to collect, process, and export telemetry data from your applications. By combining the right components and configuring them carefully, you can create a robust and customized observability pipeline that provides valuable insights into your system's performance and health.

To ensure your OpenTelemetry Collector configuration is accurate and effective, consider utilizing OTelBin. OTelBin allows you to visualize and validate your collector configurations against various collector distributions, providing feedback on potential issues and ensuring that your pipeline is set up correctly.

Related Reads

Instrumenting Spring applications with OpenTelemetry and Cloud Native Buildpacks

How to monitor and debug Terragrunt & Terraform/OpenTofu using OpenTelemetry