Observability cost out of control

This became a publicly discussed topic when Datadog revealed that Coinbase paid them $65M per year to observe their applications and systems.

The issue of escalating costs in observability can be attributed to two primary factors. Firstly, the pricing models employed by observability providers are complex and can lead to significant expenses. Secondly, customers often struggle to comprehend where and when costs are incurred, resulting in unexpected and substantial bills at the end of the billing cycle. This lack of transparency can be a significant challenge for organizations seeking to effectively manage their cloud expenditures.

Observability Pricing Models to contain and control cost

At Dash0, we strive to offer transparent and straightforward pricing that is easy for users to understand and prevents unexpected observability bills at the end of the month.

As a SaaS-based observability solution provider, mapping the customer's payment to the cost they generate is crucial. Without this, calculating costs becomes impossible, leading to uncertainty and risk for our business.

Therefore, we invite you to explore Dash0's pricing model and how we strike a balance between straightforward and fair pricing while considering our cost structure as a vendor.

Observability Pricing by Host

One model that was more popular in the On-Premises world (where the customer basically paid the cost of infrastructure) was a per Host or per Agent model. This is an easy pricing model because the customer can count the number of hosts and knows what to expect at the end of the month.

The issue with that model is that one host can, for example, create a lot of logs or traces or just idle and only send a small fraction of that data. With a fixed host price, you do not incorporate that disbalance.

One solution to that problem was to consider the host's size, e.g., the number of CPUs and GB of RAM. The rationale was that a larger host can create more data than a small host—which is true but not always. It has already made the pricing much more complex, and you need configurators to calculate the price per host.

This pricing model also impacted a customer's application architecture, as more and smaller hosts generated more costs. Also, dynamic scaling of the hosts, which can be horizontal or vertical, which is a fundamental advantage and use case of cloud infrastructure, can be problematic because you either need to charge per second/minute to adopt to the dynamic usage or charge for the maximum number of hosts you see per month.

So, pricing by Host was not an option for us, as it creates some unpredictable behavior and is also unfair to both parties.

Observability Pricing by User

In terms of observability, pricing by user could be a novel approach, as seen in SaaS solutions like Slack, Figma, and GitHub. However, for observability solutions, the number of users doesn't directly correlate with the amount of data processed and stored. While more users viewing dashboards or querying data can increase processing, some vendors combine user-based pricing with volume-based pricing, which we believe is counterproductive.

We strongly believe that observability should be accessible to all developers and SREs, and a user-based pricing model could actually hinder that goal. Furthermore, it's easy to circumvent user-based pricing by creating generic user roles or accounts shared by multiple individuals.

For Dash0, we wanted to create an easy-to-use observability solution accessible to many users within an organization without being restricted by the pricing model. Therefore, user-based pricing was not a viable option for us.

Price by Data Volume

In today's market, a popular pricing model for data ingestion is based on the amount of data processed in gigabytes (GB). This model often includes separate pricing components for processing, querying, and storing data, with different retention rates for each.

This pricing approach is considered fair because it directly reflects the cost structure of processing and storing data, which aligns with the price paid by customers. It also facilitates comparisons with other data storage services, such as Amazon Web Services (AWS) S3, which is commonly used for long-term data storage or even cold-data storage. By comparing the observability data pricing with S3 pricing, customers can determine the additional cost associated with observability services.

When we analyzed this model, we saw a few issues:

We aim for users to transmit rich spans, logs, and metrics with extensive metadata. This metadata is vital for the context of that data and provides significant value to the user.
Most models operate on "ingested data." However, there is uncertainty regarding how this is calculated. For example, what happens when OTLP data is sent in binary (GRPC) or text form (HTTP)? Binary formats are already more compressed and significantly smaller than text. Would the pricing differ based on the protocol even though processing and storage costs remain the same? Vendors should provide clarity on how and where they calculate ingested bytes.
Ingestion and storage differ significantly in size due to the high compressibility of observability data. Compression rates ranging from 10x to 20x are common. As a result, if you ingest 1GB of data, only 50-100MB is stored. Thus, the actual storage cost is not a major concern in terms of pricing.
Standardization of metadata can lead to significant deduplication, further reducing the storage cost of observability data.

One area for improvement with data volume-based pricing is its unpredictability for customers. Estimating the size of all the logs and spans generated can be challenging. Furthermore, the dynamic nature of logs and distributed tracing spans, which can vary in size depending on the context, adds to the complexity.

Transparency and ease of understanding were additional factors that led us to opt out of this pricing model.

Observability Pricing by Signal

We considered pricing based on the OpenTelemetry signal (log, metrics, span) because our analysis showed that the costs for small logs/spans were comparable to larger ones.

One advantage of pricing by signal is its simplicity in counting. Every log, span, or metric sent is counted, regardless of the protocol used, compression algorithms applied, or data size.

Another advantage is that each OpenTelemetry log, span, and metric can have metadata based on the OpenTelemetry semantic convention. This allows for easy grouping of logs, spans, and metrics by service or Kubernetes namespace. This makes it possible to see how many spans or logs have been sent for a specific service (or even operation) within a given timeframe and calculate the cost.

Detailed pricing overview detailing exactly what causes the observability costs

Optimize Cloud Costs with Real-Time Observability Dashboards

The interactive dashboard created with Dash0 provides a detailed breakdown of costs for spans and metrics. It categorizes costs by service name and displays the cost for logs per Kubernetes namespace. You can view both the count and associated cost of each signal per day. This allows you to easily identify areas of optimization and proactively manage costs.

In addition to total transparency, the dashboard offers insights into cost spikes caused by new service deployments or increased data consumption. This information enables prompt responses to stay within budget constraints. You can set up alerts to receive proactive notifications about significant cost increases, such as a 50% increase for a specific service compared to the previous day.

Furthermore, the dashboard provides advanced options to map observability costs to teams or organizational units for internal billing purposes. By simply adding attributes like "team=ABC" or "ou=123" to your signals, you can create detailed dashboards of costs per team or organizational unit in seconds. This information can be queried via API for automated processing and internal billing.

All these arguments let to the decision to price by signal count.

Mixing it all together - the overpayment dilemma

The pricing models for Observability platforms are often a complex mix of different approaches, making it challenging to understand and control costs. This complexity stems from the fact that there are multiple dimensions to consider when evaluating costs, such as the number of users, the amount of data ingested, and the specific features used.

Furthermore, it can be difficult to determine which users or data can be removed or reduced without compromising the value or functionality of the Observability platform. This can lead to overpaying for the platform and not aligning spending with the value it generates.

The Dash0 Pricing

So basically, we came up with a very simple pricing:

$0.x/M Logs ingested with 30 days retention
$0.x/M Spans ingested with 30 days retention
$0.x/M Metrics ingested with 13-month retention

Our experience with customers seeking year-over-year comparisons, such as "How did Black Friday traffic and performance compare to the previous year?" has led to the development of metrics with higher retention rates.

Fair Usage Policy

To ensure sustainable and transparent pricing, we're introducing a fair usage policy as an additional factor to our pricing model.

While we encourage customers to use our product to its full potential, we need safeguards to prevent unexpected costs resulting from unanticipated usage patterns.

This policy is closely correlated with processing costs. Incorporating it into the pricing model would add complexity, so we've opted for a more straightforward approach.

As an example, we will impose a rate limit on API calls based on the number of signals ingested. For instance, for every million signals ingested, customers can make up to a specific amount of API calls.

Additionally, we will limit the size of a single Span or Log Record to 50MB each (which is really big). This measure aims to prevent customers from injecting massive amounts of data that could lead to significant storage costs for us.

The most transparent and fair observability pricing

At our company, we're proud to offer the most transparent pricing model in the observability industry. Customers can easily see and predict their observability costs at any given time. We also provide detailed insights into where costs are rooted, based on factors such as services, Kubernetes namespaces, AWS regions, teams, and organizational units.

We believe in being open and honest about what you pay for. Every OpenTelemetry signal you ingest, whether it's a log, a span, or a metric, is clearly accounted for. There are no hidden costs or surprises.

In the future, we aim to provide advanced mechanisms for enhanced cost control. One approach is to set cost limits per service and automatically adjust sampling rates to adhere to those limits. Additionally, our dataset concept allows customers to distribute telemetry data across multiple, independent datasets. This enables the definition of specific qualities for each dataset, such as retention rates, sample rates, cost considerations, and audit trails, which directly impact overall costs. Ultimately, this approach allows for the creation of systems where data retention periods are optimized, and unnecessary data collection is minimized, resulting in further reductions in observability expenses.

Our goal is to make the world of observability more transparent, easy, and fair for everyone.