• 6 min read

What is a Span?

A span in Distributed Tracing Systems represents a unit of work or operationIt is the fundamental building block that captures the execution and timing of individual operations in a distributed transaction. Each span encapsulates essential information about a specific operation, including its name, start and end times, attributes (tags), and causal relationships with other spans.

Spans and Traces

Multiple spans come together to form a trace, which represents the complete journey of a request through a distributed system. Spans within a trace are typically organized chronologically, with each span's timestamps indicating its position in the request's timeline. Parent-child relationships between spans create a hierarchical structure, showing how operations are nested or dependent on each other. Span links provide a way to associate related spans across different traces, useful for scenarios like batch processing or asynchronous operations where traditional parent-child relationships don't fully capture the relationship between operations.

Origins in the Dapper Paper

Google's Dapper paper, published in 2010, introduced the concept of spans as we know them today. In Dapper, a span represents the basic unit of work and contains annotations and key-value pairs that describe the work being performed. The paper defined spans as having a unique identifier, parent span identifier, and timing information, establishing the foundation for modern distributed tracing systems.

Evolution of Distributed Tracing Systems

The distributed tracing landscape has evolved through several key systems and standards:

  • Zipkin: Created by Twitter in 2012, inspired by Google's Dapper paper. It introduced the first widely-adopted open-source implementation of distributed tracing.
  • Jaeger Tracing: Developed by Uber in 2016, offering a more modern architecture and better scalability than Zipkin.
  • OpenTracing: Emerged in 2016 as a vendor-neutral API standard, allowing developers to instrument applications without vendor lock-in.
  • OpenTelemetry: Created in 2019 by merging OpenTracing and OpenCensus, becoming the de facto standard for instrumentation and telemetry data collection.

OpenTelemetry has effectively become the successor to both OpenTracing and OpenCensus, providing a unified approach to observability that includes not just tracing, but also metrics and logs. While Zipkin and Jaeger continue to be popular trace visualization and storage backends, they now commonly integrate with OpenTelemetry for data collection.

Implementation of Spans in OpenTracing and OpenTelemetry

OpenTracing standardized the span concept across different tracing systems. OpenTelemetry, which merged OpenTracing and OpenCensus, further refined the span specification. In both systems, spans maintain these core characteristics:

  • Operation name that describes the work being done
  • Start and end timestamps
  • Span context (trace ID, span ID, and parent span ID)
  • Attributes (key-value pairs)
  • Events (timestamped logs)
  • Links to related spans

Semantic Conventions for Span Attributes

OpenTelemetry defines semantic conventions for span attributes to ensure consistency across different services and applications. Here are common examples:

  • HTTP Requests:
    • http.method: "GET", "POST", "PUT"
    • http.response.status_code: 200, 404, 500
    • http.url: "https://api.example.com/users"
  • Database Operations:
    • db.system: "mysql", "postgresql", "mongodb"
    • db.statement: "SELECT * FROM users"
    • db.operation: "query", "insert", "update"
  • RPC Calls:
    • rpc.system: "grpc", "jsonrpc"
    • rpc.service: "PaymentService"
    • rpc.method: "ProcessPayment"
  • General Service Information:
    • service.name: "payment-processor"
    • service.version: "1.0.0"
    • service.instance.id: "instance-abc123"
A Dash0 Span detail view showing standard and custom Span attributes

A Dash0 Span detail view showing standard and custom Span attributes

Commonly used Span Kinds

Spans are categorized into different kinds based on their role in the system. The following ones are frequently used:

  • INTERNAL: Default span type representing internal operations within a service
  • SERVER: Represents the handling of an incoming request on the server side
  • CLIENT: Represents outgoing requests from a service to an external system
  • PRODUCER: Indicates the sending of a message to a message broker or queue
  • CONSUMER: Represents the processing of a message received from a message broker or queue
Filtering for available Span Kinds in Dash0

Filtering for available Span Kinds in Dash0

Context Propagation and Correlation

Context propagation is essential for maintaining trace context across service boundaries in distributed systems. Here's how it works:

  • Correlation Headers: These are special HTTP headers that pass trace context between services. The most common ones are:
    • traceparent: Contains trace ID, span ID, and trace flags
    • tracestate: Allows vendors to add custom trace information
  • W3C Trace Context: This is the standard specification for propagating context across service boundaries, ensuring interoperability between different tracing systems.
  • Baggage API: This is a mechanism for carrying arbitrary key-value pairs alongside the trace context. Unlike trace context, baggage is application-specific data that can include:
    • User IDs
    • Tenant information
    • Custom correlation identifiers

Code Examples

Here's how to create and add attributes to spans in Go using OpenTelemetry:

go
spans.go
01234567891011121314151617181920212223
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"context"
)
func performOperation(ctx context.Context) {
tracer := otel.Tracer("service-name")
ctx, span := tracer.Start(ctx, "operation-name")
defer span.End()
// Add attributes to the span
span.SetAttributes(
attribute.String("customer.id", "123"),
attribute.Int64("items.count", 5),
attribute.Float64("order.total", 99.99)
)
// Add events
span.AddEvent("processing.started")
// ... perform work ...
span.AddEvent("processing.completed")
}

And here's the equivalent example in Java:

java
spans.java
01234567891011121314151617181920212223242526
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.api.common.Attributes;
public class TracingExample {
private final Tracer tracer;
public void performOperation() {
Span span = tracer.spanBuilder("operation-name")
.startSpan();
try (var scope = span.makeCurrent()) {
// Add attributes
span.setAttribute("customer.id", "123");
span.setAttribute("items.count", 5);
span.setAttribute("order.total", 99.99);
// Add events
span.addEvent("processing.started");
// ... perform work ...
span.addEvent("processing.completed");
} finally {
span.end();
}
}
}

Best Practices for Span Usage

When working with spans, consider these important practices:

  • Keep span names concise but descriptive
  • Add relevant attributes that aid in troubleshooting
  • Maintain proper parent-child relationships between spans
  • End spans as soon as the operation completes
  • Use span events to mark significant points in the operation's lifecycle

Understanding spans is crucial for implementing effective distributed tracing. Whether using open-source frameworks or commercial solutions, the fundamental concept of spans provides the foundation for tracking and understanding the behavior of distributed systems.