What is a Span?

Spans and Traces

Multiple spans come together to form a trace, which represents the complete journey of a request through a distributed system. Spans within a trace are typically organized chronologically, with each span's timestamps indicating its position in the request's timeline. Parent-child relationships between spans create a hierarchical structure, showing how operations are nested or dependent on each other. Span links provide a way to associate related spans across different traces, useful for scenarios like batch processing or asynchronous operations where traditional parent-child relationships don't fully capture the relationship between operations.

Origins in the Dapper Paper

Google's Dapper paper, published in 2010, introduced the concept of spans as we know them today. In Dapper, a span represents the basic unit of work and contains annotations and key-value pairs that describe the work being performed. The paper defined spans as having a unique identifier, parent span identifier, and timing information, establishing the foundation for modern distributed tracing systems.

Evolution of Distributed Tracing Systems

The distributed tracing landscape has evolved through several key systems and standards:

Zipkin: Created by Twitter in 2012, inspired by Google's Dapper paper. It introduced the first widely-adopted open-source implementation of distributed tracing.
Jaeger Tracing: Developed by Uber in 2016, offering a more modern architecture and better scalability than Zipkin.
OpenTracing: Emerged in 2016 as a vendor-neutral API standard, allowing developers to instrument applications without vendor lock-in.
OpenTelemetry: Created in 2019 by merging OpenTracing and OpenCensus, becoming the de facto standard for instrumentation and telemetry data collection.

OpenTelemetry has effectively become the successor to both OpenTracing and OpenCensus, providing a unified approach to observability that includes not just tracing, but also metrics and logs. While Zipkin and Jaeger continue to be popular trace visualization and storage backends, they now commonly integrate with OpenTelemetry for data collection.

Implementation of Spans in OpenTracing and OpenTelemetry

OpenTracing standardized the span concept across different tracing systems. OpenTelemetry, which merged OpenTracing and OpenCensus, further refined the span specification. In both systems, spans maintain these core characteristics:

Operation name that describes the work being done
Start and end timestamps
Span context (trace ID, span ID, and parent span ID)
Attributes (key-value pairs)
Events (timestamped logs)
Links to related spans

Semantic Conventions for Span Attributes

OpenTelemetry defines semantic conventions for span attributes to ensure consistency across different services and applications. Here are common examples:

HTTP Requests:
- http.method: "GET", "POST", "PUT"
- http.response.status_code: 200, 404, 500
- http.url: "https://api.example.com/users"
Database Operations:
- db.system: "mysql", "postgresql", "mongodb"
- db.statement: "SELECT * FROM users"
- db.operation: "query", "insert", "update"
RPC Calls:
- rpc.system: "grpc", "jsonrpc"
- rpc.service: "PaymentService"
- rpc.method: "ProcessPayment"
General Service Information:
- service.name: "payment-processor"
- service.version: "1.0.0"
- service.instance.id: "instance-abc123"

A Dash0 Span detail view showing standard and custom Span attributes

Commonly used Span Kinds

Spans are categorized into different kinds based on their role in the system. The following ones are frequently used:

INTERNAL: Default span type representing internal operations within a service
SERVER: Represents the handling of an incoming request on the server side
CLIENT: Represents outgoing requests from a service to an external system
PRODUCER: Indicates the sending of a message to a message broker or queue
CONSUMER: Represents the processing of a message received from a message broker or queue

Filtering for available Span Kinds in Dash0

Context Propagation and Correlation

Context propagation is essential for maintaining trace context across service boundaries in distributed systems. Here's how it works:

Correlation Headers: These are special HTTP headers that pass trace context between services. The most common ones are:
- traceparent: Contains trace ID, span ID, and trace flags
- tracestate: Allows vendors to add custom trace information
W3C Trace Context: This is the standard specification for propagating context across service boundaries, ensuring interoperability between different tracing systems.
Baggage API: This is a mechanism for carrying arbitrary key-value pairs alongside the trace context. Unlike trace context, baggage is application-specific data that can include:
- User IDs
- Tenant information
- Custom correlation identifiers

Code Examples

Here's how to create and add attributes to spans in Go using OpenTelemetry:

spans.go12345789101112131415161718192021222324
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "context"
)
func performOperation(ctx context.Context) {
    tracer := otel.Tracer("service-name")
    
    ctx, span := tracer.Start(ctx, "operation-name")
    defer span.End()
    
    // Add attributes to the span
    span.SetAttributes(
        attribute.String("customer.id", "123"),
        attribute.Int64("items.count", 5),
        attribute.Float64("order.total", 99.99)
    )
    
    // Add events
    span.AddEvent("processing.started")
    // ... perform work ...
    span.AddEvent("processing.completed")
}

And here's the equivalent example in Java:

java

spans.java12346789101112131415161718192021222324252627
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.api.common.Attributes;
public class TracingExample {
    private final Tracer tracer;
    
    public void performOperation() {
        Span span = tracer.spanBuilder("operation-name")
            .startSpan();
            
        try (var scope = span.makeCurrent()) {
            // Add attributes
            span.setAttribute("customer.id", "123");
            span.setAttribute("items.count", 5);
            span.setAttribute("order.total", 99.99);
            
            // Add events
            span.addEvent("processing.started");
            // ... perform work ...
            span.addEvent("processing.completed");
        } finally {
            span.end();
        }
    }
}

Best Practices for Span Usage

When working with spans, consider these important practices:

Keep span names concise but descriptive
Add relevant attributes that aid in troubleshooting
Maintain proper parent-child relationships between spans
End spans as soon as the operation completes
Use span events to mark significant points in the operation's lifecycle

Understanding spans is crucial for implementing effective distributed tracing. Whether using open-source frameworks or commercial solutions, the fundamental concept of spans provides the foundation for tracking and understanding the behavior of distributed systems.