• 8 min read

AI in Dash0: Building the Future of Observability

Since the launch of ChatGPT in November 2022, generative AI has surged to the forefront of innovation. With tech giants investing over $200 billion in CapEx this year—particularly in GPU clouds for training and running AI models—the rise of AI is reshaping industries. Companies like OpenAI, Anthropic, Meta (with its open-source LLaMA), and Europe’s Mistral lead the foundational model market.

In observability, the integration of generative AI is quickly becoming a cornerstone of modern application architectures. Dash0 customers are already leveraging this trend, and we’re committed to staying at the forefront of innovation by rethinking how observability for LLM technologies can work based on OpenTelemetry.

At the same time, we believe that AI can make observability smarter, faster, and more intuitive.

This article outlines how Dash0 is embracing AI. While the field is evolving rapidly, our vision prioritizes usability, practicality, and impact over hype. Let’s dive in.

Key Thoughts on AI in Observability

1. Generative AI Will Enhance Observability, Not Redefine It

While we don’t believe that generative AI will fundamentally change how observability systems are built (for now), it can make life significantly easier for platform and application engineers. By automating tedious, error-prone, manual tasks and analyzing growing volumes of observability data, AI can remove friction from workflows and enable teams to focus on solving critical problems that require a mix of contextual understanding, causal inference, and creativity, which are inherently human characteristics, but can be more and more supported by AI.

2. Observability Doesn’t Need a Chat-Bot or Co-Pilot

We’re not here to reinvent Clippy. Engineers don’t need a chatbot explaining the obvious—they need tools that work seamlessly in the background, enhancing their workflows without distraction and with high fidelity of results. Observability solutions should solve real problems, not add gimmicks (or noise).

3. Natural Language Queries Aren’t Always the Answer

While asking for logs or metrics in natural language is appealing, complex queries like those in PromQL often require an understanding and precision that’s hard to replicate with plain language. For example, how would you craft a PromQL query like this using natural language?

text
PromQL
0123456
record: job_instance_method_path:demo_api_request_errors_50x_requests:rate5m
expr: >
rate(demo_api_request_duration_seconds_count{status="500",job="demo"}[5m]) * 50
> on(job, instance, method, path)
rate(demo_api_request_duration_seconds_count{status="200",job="demo"}[5m])

Instead, Dash0 focuses on intuitive query builders that empower users to construct complex queries for all common use cases easily and repeatably with just a few clicks and a high-level structure that mimics how you think of these problems.

This said, we are also experimenting with alternatives to using GenAI to create PromQL, e.g., using information from dashboards or user activity.

4. Root Cause Analysis Needs More Than GenAI

While GenAI enriches and optimizes observability data, it’s not yet the silver bullet for root cause analysis in large, connected microservice environments. Instead of promising an all-knowing solution, Dash0 combines GenAI with machine learning, statistics, and user-driven exploration. This approach delivers multiple plausible causes, empowering users to decide based on context, previous experience, and knowledge that is not encoded in telemetry data.

AI as Seamless User Experience

At Dash0, we aim to integrate AI invisibly—enhancing usability without requiring users to learn new workflows. A great analogy is the Apple Photos app: you can select text from a photo, copy it, or translate it with a click. The AI works seamlessly in the background, improving usability without explicit user interaction.

Source Comparison

Example showing how to select and copy text or translate all text in the photo of our KubeCon booth in Salt Lake City.

We want to replicate this experience in observability. Let’s explore how we’re achieving this with Log AI, our first generative AI feature.

Our litmus test is the following: if a feature's selling point is that it is made with AI, it is probably not a great feature. AI is a powerful means to an end. And that end is Observability, simplified.

Log AI: Automatic Log Level Detection

Logs are either structured or unstructured.

  • Structured Logs: Contain a message and a set of key/value pairs. These key-value pairs may follow standards to ease their reuse and improve consistency, like using the (fantastic) OpenTelemetry semantic conventions.
  • Unstructured Logs: Text messages without semantic tags, often requiring manual patterns or toilsome regular expressions to parse log levels and other metadata.

Structured logs are unquestionably better to work with in terms of observability. They are easier to query. They provide more insights, more readily.

However, unstructured logs are far more commonplace than structured ones, and this degrades the observability experience of many practitioners.

To simplify this, Dash0 introduces Log AI, which automatically detects and assigns log levels to unstructured logs (and this is just the first step). This eliminates manual configuration, enabling users to filter, search, and alert on logs effortlessly. As we are an OpenTelemetry-native observability tool, we extract and map the log level to the OTel semantic convention:

In other words, we use AI to handle all the hard work of finding structure where there’s too little of it. And the precision at which we do it is stellar.

Why Log AI?

We are very particular about the observability functions we offer. It must just work. It must not send you down rabbit holes. And it must be delightful. So, as we set out to solve the problem of unstructured logs, we set stringent requirements:

  • High Specificity: Avoid false positives (e.g., assigning a log level where none exists).
  • Max Accuracy: Ensure assigned log levels are always correct by using metrics evaluated on a combination of community and proprietary data.
  • Good Sensitivity: Extract log levels wherever possible without errors.

Here’s a snapshot of Log AI in action:

Before: Most logs are “gray” (unknown status).

After: Log AI assigns accurate levels (e.g., ERROR, WARN, INFO, DEBUG), almost entirely eliminating “gray” logs.

This feature detects log levels for over 98% of unstructured logs with 0% false positives—a game-changer for scalability and usability. In other words, out of 50 logs without severity, our model guesses 49 correctly and does not misclassify the others.

Scaling AI Without Extra Costs

One of our core principles is ensuring that AI features don’t lead to skyrocketing costs for customers. By optimizing the implementation of Log AI, Dash0 processes billions of logs daily without additional overhead.

Affordable scalability.

The Future of AI in Dash0

Log AI is just the beginning. Here’s what’s next:

  • Extracting More Structured Data: Automatically identifying additional patterns and metadata embedded in unstructured logs.
  • Grouping Logs and Detecting Patterns: Classifying log templates to simplify the analysis.
  • Optimizing Trace Readability: Making complex traces easier to navigate and interpret.
  • Advanced Analytics: Combining machine learning, statistics, and GenAI-enriched data to uncover patterns, outliers and anomalies in large datasets.

Our goal is to deliver an “Apple Photos experience” for observability—powerful AI-driven features that are intuitive, seamless, and effective.

Conclusion

Generative AI in Dash0 empowers developers and platform engineers to tackle observability challenges with less effort and more precision. Instead of hype-driven features, we focus on practical, impactful tools that deliver great value for real-world use cases. As the field evolves, so will Dash0—always with a commitment to usability, transparency, and scalability.

If you’re ready to experience next-generation observability, try Dash0 today!