A Guide to Error Budget Policies

Getting Started

Error Budget Policies (EBP) create "wiggle room” for down time. They acknowledge the futility of 100% uptime as impossible and not worth the expense when it comes to service quality and availability.

If an SLO states that it will target 99.9%, for example, then the EBP acknowledges and accepts 0.1 % imperfection for the total time the service is available. By planning for imperfection, the EBP can:

  • Factor in improvements in service (be they hardware or software) as long as they can be integrated, tested, and restored to Production within the EBP allowance
  • Monitor and assess Actual availability to help define how much down time vs. uptime is cost-effective
  • Continue to update, patch, and enhance the customer experience as long as the EBP is not exceed

Create an Error Budget Policy

  1. Select the Error Budget Policy (EBP) option on the left side of the SLO Landing window. Blameless shifts to the SLI Step, displaying a list of available SLIs (if any).
  2. If an EBP exists, you can click on it to open the EBP Details window.
  3. If creating a new Error Budget: Click on the "New Policy" button on the right side of the window. A new modal opens.
  4. Enter a Policy name and description. The “Save” button goes active.
  5. Click the “Save” button.
New Error Budget Policy Creation window

Setting Policy Thresholds

  1. Open the new Policy. Within the new policy window, you will see a list of thresholds that can be set.
Error Budget Policy Thresholds window
  1. Set an automated Threshold(s).
  2. Select the desired threshold. Thresholds are available currently for 25%, 50%, 75%, and 100% of the error budget.
  3. Select the notification type.
  • Notify via E-mail

  • Notify via Slack

  • Create a Blameless ticket

    This means, for example, you could set up an E-mail notification for 25% and 50%, then “escalate” by setting a Slack notification at 75%, and finally generate a Blameless Incident ticket at 100% by setting the Severity as well as Incident type.

  1. Click the “Save” button to complete the Error Budget.

Edit an Error Budget Policy

  1. Select the Error Budget Policies (EBP) option on the left side of the SLO Landing window. Blameless shifts to the error budget policies page, displaying a list of available error budget policies (if any).
New Services window
note

As with the other windows, the ellipse (three dots) at the end of each line gives you the following action options regarding that item:

  • Edit
  • Delete
warning

If you click on Delete, you will receive a warning that you are about to permanently remove the item.

  1. If an EBP exists, you can click on it to open the EBP Details window.

Setting Policy Thresholds

  1. Open the new Policy. Within the new policy window, you will see a list of thresholds that can be set.
Error Budget Policy Thresholds window
  1. Set an automated Threshold(s).
  2. Select the desired threshold. Thresholds are available currently for 25%, 50%, 75%, and 100% of the error budget.
  3. Select the notification type.
  • Notify via E-mail

  • Notify via Slack

  • Create a Blameless ticket

    This means, for example, you could set up an E-mail notification for 25% and 50%, then “escalate” by setting a Slack notification at 75%, and finally generate a Blameless Incident ticket at 100% by setting the Severity as well as Incident type.

  1. Click the “Save” button to complete the Error Budget.

For More Information

For instructions regarding the creation, configuration, and use of User Journeys, Error Budgets, SLOs, and SLIs, refer to the following SLO references:

Blameless SLO Definitions

An Introductory Guide to Blameless SLOs

A Guide to Getting started with Blameless SLOs

A Guide to Building a New SLO

A Guide to Error Budget Policies (this document)

A Guide to Managing Blameless SLOs

A Guide to Understanding your SLOs

Refer to the Google SRE Handbook for more information regarding Site Reliability Engineering.