SLO Definitions

Getting Started

Service Level Objectives (SLO) have become the common language for cross-functional teams to set guardrails and incentives to drive high levels of service reliability. Within the Service Level Objective (SLO) and Service Level Indicator (SLI) are a quantitative measure, typically provided through your APM platform. These refer to either Latency, Availability, Throughput, or Saturation, and are points on a digital user journey that contribute to customer experience and satisfaction.

Service Levels

Percentage of the current, measured, Service Level Indicator, sampled out over a specific time window(*), as time series data from core metrics that are injected continuously.

Service Level Indicators

A Service Level Indicator (SLI) is “a carefully defined quantitative measure of some aspect of the level of service that is provided.” SLIs are a quantitative measure, typically provided through your APM platform. Traditionally, these refer to either latency or availability, which are defined as response times, including queue/wait time, in milliseconds. A collection of SLIs, or composite SLIs, are a group of SLIs attributed to a larger SLO. These indicators are points on a digital user journey that contribute to customer experience and satisfaction. Once you have SLIs set up, you connect them to your SLOs, which are targets against your SLI.

Service Level Objectives

A Service Level Objective (SLO) is “a target value or range of values for a service level that is measured by an SLI. A natural structure for SLOs is thus SLI ≤ target, or lower bound ≤ SLI ≤ upper bound.

It is the minimum percentage of requests (e.g. 95.90%) over a specific time window(*) that teams have decided that they have to meet a service level objective (SLO) on their SLI. The entered value will typically be set somewhere between 100% and 0%.

Error Budget Policy

An error budget is the percentage of remaining "wiggle room" you have in terms of your SLO. Generally, you’ll institute a rolling window versus historical purview into your data. This keeps that SLO fresh and constantly moving forward as something that you can monitor. It’s not enough to know what your error budget is; you also need to know what you’ll do in the event of error budget violations. You can do this through an error budget policy, which determines alerting thresholds and actions to take to ensure that error budget depletion is being addressed accordingly.

Services

Service Level Indicators are grouped by service in the catalog of Service Level Indicators.

Service Level Agreements

A Service Level Agreement is “an explicit or implicit contract with your users that includes consequences of meeting (or missing) the SLOs they contain.” SLAs are set by the business rather than engineers, SREs, or ops. When anything happens to an SLO, typically your SLA will kick in; they're the actions that are taken when your SLO fails and often result in financial or contractual consequences.

Reliability Target

The minimum percentage of requests (e.g. 95.90%) over a specific time window(*) that teams have decided that they have to meet a service level objective (SLO) on their SLI. The entered value will typically be set somewhere between 100% and 0%.

Burn Rate

Error Budget Burn Rate is a number relative to the reliability target set for an SLO. This describes how fast you are burning your error budget.

note

The burn rate reflects recent changes more rapidly than the remaining error budget value.

For instructions regarding the creation, configuration, and use of User Journeys, Error Budgets, SLOs, and SLIs, refer to the following SLO references:

Blameless SLO Definitions (this document)

An Introductory Guide to Blameless SLOs

A Guide to Getting started with Blameless SLOs

A Guide to Building a New SLO

A Guide to Error Budget Policies

A Guide to Managing Blameless SLOs

A Guide to Understanding your SLOs

Refer to the Google SRE Handbook for more information regarding Site Reliability Engineering.