Resources in OpenTelemetry are used to document which systems the telemetry is describing, and they are often the difference between telemetry from which you can gain insights from, and “just data”.
When instrumenting services with OpenTelemetry, adhering to semantic conventions ensures consistent, accurate, and meaningful telemetry data across systems.
The resource attributes specified in the OpenTelemetry semantic conventions can be categorized into two macro categories:
- Those that represent technical information about where and how components of your system are running, like the Kubernetes resource attributes.
- Attributes that describe logical, architectural or organizational aspects of your systems: what service is powered by this Kubernetes pod? Which role does it play in the service’s architecture? Who is responsible for this service in terms of operations and paying the infrastructure bills?
These metadata can be used to great effect in your observability tool to contextualize the telemetry you are looking at.
In this blog post, we explore how to annotate your telemetry so that you know which service it is, which usually is the first question you have in mind when being paged by an alert, quickly followed by “How bad is it?” and “How many people are going to be upset because of this?” ¹.
Overview of the service.*
namespace
At the time of writing, the service.*
namespace of OpenTelemetry Semantic Conventions specifies the following attributes:
service.name
- Logical name of the service. Example: "checkout"service.namespace
- Optional namespace to group services. Example: "acme-webstore"service.version
- Optional version of the deployed service. Example: "v1.2.3"service.instance.id
- Optional unique identifier of the service instance. Example: "instance-12345"
In the beginning, there is the name
In many cultures, the ability of naming things, of knowing the true name of things, confer power over them. In observability, that is very much the same. Indeed, service.name
is the most critical resource attribute of them all, to the extent that it is one of the only few resource attributes that an OpenTelemetry SDK is required to set in the metadata it sends.
In most situations, the easiest way to set the service.name
resource attribute is via the OTEL_SERVICE_NAME
environment variable, although if you set up the OpenTelemetry SDK with code in your application, you might also hard-code the value there.
So, each component you deploy and monitor with OpenTelemetry should have its service name specified. And each deployment of the same component, should have the same case-sentitive name: the last thing you want to have to deal with during an outage is to figure out that it’s checkout
in a production deployment, and CheckOut
in another, as that makes querying and comparing more difficult and confusing.
A similar benefit in terms of consistency is using consistent capitalization across different services (e.g., camel case, kebab case or snake case), although that tends to have far less important in the great scheme of things as long as all deployments of the same component use the same case-sensitive name. Special mention goes to naming schemes, because who doesn’t like having a fleet of components named after planets, fantasy characters, or dog breeds.
Use service.namespace
for logical grouping
If you have multiple services that logically belong to the same group or system, for example like the frontend
and checkout
services belonging to the same acme
webshop, use service.namespace
to group them. For example:
service.namespace = "acme-webstore"
service.name = "frontend"
service.name = "checkout"
Having a common namespace for all the components of your application helps in grouping things. It also helps avoid mixing telemetry across different components with the same name: many applications have a frontend
and an api
service, and you will likely want to treat them separately for troubleshooting and monitoring.
We recommend using service.namespace
for the names of the products and applications that the service is mainly involved in.
Service versions
The service.version
attribute is meant to represent your own definition of the version of a service. Good options for its value include the name of the Git tag you deployed (or the first seven digits of the Git commit identifier if you continuously deploy from main), but pretty much whatever you think as “version” goes.
Annotating the telemetry of your services with the version is invaluable when you are running rollouts, in which you definitely want to compare how the new version and the old one behave:
Ensure unique service.instance.id
values
Optionally, you can annotate on instances of your service components unique identifiers using the service.instance.id
attribute. This is particularly useful if there is no other univocal notion of identity for your instance, in which case you should generate a random UUID. However, you could reuse an already-existing unique identifier that is tightly coupled with the service instance, like a Kubernetes pod UID, a AWS ECS task ARN, or the machine-id of the host running (as long as there is only one service instance on the host). The only strong requirement is that each combination in your telemetry of service.namespace
, service.name
, and service.instance.id
must uniquely identify one instance of your services.
However, setting the service.instance.id
attribute to the same value as a technical value like a pod UID does not void the need to set the technical value too. That is, if you use the pod uid for service.instance.id
attribute, you should also set that value for k8s.pod.uid
. Moreover, using the same value could engender confusion on your team members that are less savvy in terms of metadata (“why do we have the same service.instance.id
and k8s.pod.uid
for this component, but for this other it looks something completely different?”).
Bonus: Define Environments for Context
While it is not part of the service.*
semantic conventions namespace, the deployment.environment.name
attribute can do wonders to clarify the environment in which your services are running. Common values include:
- "Production" (or prod), or
- "staging" / “test” / “qa”
- "development" / “dev”
Combined with the service and the cloud.region
, this provides a great, often self-contained overview of what you need to know to understand which system is involved. For example:
service.namespace = "easy-booking"
service.name = "payment"
cloud.region = "us-east-1"
deployment.environment.name = "production"
Conclusions
Resource attributes make the difference between telemetry and “just data”. In this post, we have covered best practices on annotating your telemetry so that you are always going to know which service has issues where. OpenTelemetry, with its semantic conventions, provides a rich, well thought-out vocabulary of how to annotate your telemetry according to industry-standard best practices.
And if you want to make sure that you make the best use of your telemetry metadata, give Dash0 a spin. We are building Dash0 around the resources and semantic conventions with what we call Resource centricity: all your telemetry, irrespective of which agent or instrumentation collects it, correlates with the rest around the metadata you provide as resource attributes.
─────────
¹ For some specific services, like pretty much any checkout
service I have ever come across, the “Who is going to be angry” trumps the “How many” in very palpable, leadership-facing ways.