A Guide to Managing Blameless SLOs

Getting Started

Once you have the SLO components together, you need to look at managing your SLO environment using your identified Best Practices. To help support that effort, this guide describes the following actions:

  • Creating
  • Adding
  • Editing
  • Deleting
note

It is important to check your associations within the SLO components as you will not be able to delete or edit some of these components if they are inter-associated.

Managing User Journeys

The User Journey is where it all starts. The user journey is a sequence of tasks that is a base on which SLOs are constructed and executed.

Adding User Journeys

The user journey is composed of SLOs which allow the user to examine the state of their reliability conformance. When you launch the SLO Manager, for example, it opens the existing User Journeys list (if any) by default.

Existing User Journey List

Adding a new User Journey entails the following:

  1. Click on the “+ New Journey” Button in the upper right corner of the User Journey step (the opening page when you click on the SLO Manager icon on the left nav bar). This launches the SLO Wizard. The SLO Wizard walks you through the process, showing your progress at the top of the window.
New User Journey opening window
  1. Enter a User Journey Name (*=Required) from the SLO Wizard start page.
  2. Enter a User Journey Definition (*=Required).
  3. Click on the Owner “Unassigned” button to assign an owner. a. Start entering the name of the owner you wish to assign. Blameless will provide a list of known, eligible owners.
  4. When all of the required fields have been defined, the “Save” button becomes active.
  5. Click the “Save” button.
  6. Click on the “Next” button. The SLO Wizard advances you to the next step, assigning a known SLI or creating a new SLI.

Editing User Journeys

The user journey is composed of Services, SLIs, and SLOs which allow the user to examine the state of their profile conformance. If you want to edit the contents of an existing User Journey, there are two ways to do this:

  1. Click on the down arrow to the left of the User Journey name, which opens a "Quick View" of the associated SLOs.
  2. Click on the User Journey name itself, which opens the User Journey window and displays a list of associated SLOs in a Card structure with more details such as:
  • Reliability Target value
  • Service Level value
  • Error Budget Snapshot usage
  1. Modify an existing SLO by clicking on the Card.
  2. Add an SLO by clicking on the "+Add SLO" buttons (there are two, both execute the same creation process).
  3. Change the User Journey Summary value fields that have a pencil icon next to them (i.e., “Description” and “Team”).
note

You must create a service or services and at least one SLI prior to creating the SLO. Also, while you can have multiple SLOs per User Journey, there is only one SLI per SLO.

note

You will notice another button to the left of the “+Add SLO” button. This allows you to change how the SLO displays.

IconDescription
Bullet listSLO Table view
Stacked cubesSLO Card view
New Services window

Deleting User Journeys

  1. Click on the three dots at the ellipse at the end of the User Journey line to be removed. As with the other windows, the ellipse (three dots) at the end of each line gives you the following action options regarding that item:

    • Edit
    • Delete
New Services window
warning

If you click on Delete, you will receive a warning that you are about to permanently remove the item.

Managing SLOs

Best Practice states you follow a consistent format for your naming conventions, with the goal of being able to look at the name of the SLO component and understand their goal. The challenge will be a generic enough name while allowing you to associate the SLO across multiple queries.

Adding SLOs to a User Journey

Adding an SLO to an existing User Journey occurs via the SLO Wizard, regardless of whether you are a new user or a seasoned SLO user. The SLO Wizard walks you through the steps, showing your progress at the top of the window.

Adding an SLO to an existing User Journey, requires the following:

  1. Select a User Journey
  2. Click the “Add SLO” button to start the SLO Wizard
  3. Select an existing SLI or create a new SLI
  4. Configure the SLO (reliability target, relative to the selected SLI type)
  5. Select an Error Budget Policy (optional)

Once these items are complete, you need to save the SLO.

note

The best practice for User Journey analysis is collaboration across teams and groups to collect the journey information (e.g. engineering, product, site reliability engineers, customer success, etc.).

note

For this example, we are using a Latency SLI to create the following SLO. An SLI must be selected or created first to create an SLO against that SLI.

  1. Click on the User Journeys option in the upper left corner of the window.
  2. Click on the desired User Journey. A new window opens. The User Journey window will contain a number of options:
  • A status for the User Journey
  • A list of any existing SLOs associated with the User Journey
  • A summary of the User Journey
  • Options to add a new SLO or edit an existing SLO if desired.
  • Click on the Existing SLOCard to open it
  • Click on the “+ ADD SLO” button to create a new one
Create a New SLO window
  1. Click on the “+ Add SLO” button. A new "SLI" window opens.
note

The SLI window appears because you must have at least one SLI to create an SLO.

Create a New SLO--determine SLI
  1. Select the SLI option you desire:

    a. Select an existing SLI from the list provided by the Blameless application from the drop-down. b. Enter a name for the new SLI.

note

Best practices suggest something that reflects the User Journey it is associated with.

For example: Login Latency for 95% percentile.

note

You can also search for a specific SLI from the available list using the provided search window under the drop-down.

  1. Associate the SLO with an Error Budget Policy beneath the SLO Name field. When you have selected a valid existing SLI or complete a valid new SLI, the SLI window opens and the “Next” button at the bottom of the screen becomes active.
Create a New SLO--SLI assigned
  1. Click the “Next” button. The SLO Details window opens.
Create a New SLO-assign Error Budget
  1. Enter the SLO Name (a required field).
  2. Specify the time percentage you need to meet the SLI (required field).
  3. Specify the Latency (second or milliseconds).
note

Depending on the percentage you enter, the “Total Error Budget” values in Days, Hours Minutes, and Seconds will change accordingly .

  1. Specify the Threshold Violation Operator (greater than, less than, etc.)
  2. Select the SLO Status type (required field).
note

“Active” and “Testing” options will affect your Error Budget (decrease your budget) while “Development” will not.

  1. Select an Error Budget to associate to the SLO.
  2. Click the “Save” button.
  3. Click the “Finish” Button to Complete. Blameless returns you to the User Journey Step. The new SLO should now appear in the User Journey window as an option.

When the SLO kicks off, it will (currently) connect to the selected Data source and start digesting data for the previous 28 day time window to measure against the SLO(s) that are activated.

note

Be aware that this may take some time to "crunch the numbers" once it has started.

The SLI status is reported depending on the status. For example:

SLI StatusIcon typeTool tip
In ProgressSpinning wheelThis SLI is currently fetching the latest data from your APM.
Backfill completedGreen circle checkmarkSuccessfully fetched latest data from your APM.
ErrorRed circle exclamation point“Error while fetching…”.
No incoming dataTBDFuture Feature

Examples regarding these status icons appears in the following:

Existing SLO components
note

The Error message will be similar to the sample image, based on the type of error and explanation available.

When it is done, Blameless will generate a chart of the data it has digested, based on the parameters set and the Error Budget policy values, and display it below the SLO list on the Step.

SLO Results window with charts

You will note within the Details window you have several icons identifying actions you can apply to the elements in the window. These are identified and the action defined in the following table.

IconTypeAction
“...”Drop-downEdit SLO
Recalculate Error Budget
Delete SLO
“+”ActionAdd SLO / SLI
Pencil iconActionEdit the associated field
“X”ActionClose Details Window

Editing SLOs

  1. Click on the User Journey Option in the upper left corner of the window.
  2. Click on the desired User Journey. A new window opens. The User Journey window will contain a number of options:
  • A status for the User Journey
  • A list of any existing SLOs associated with the User Journey
  • A summary of the User Journey
  • Options to add a new SLO or edit an existing SLO if desired
  1. Click on the Existing SLO Card to open it. The SLO Details window opens.
Create a New SLO-assign Error Budget
  1. Edit the fields you wish to change.
  2. Click on the “Save” Button.
  3. Click on the “Finish” Button.
note

Depending on the percentage you enter, the “Total Error Budget” values in Days, Hours Minutes, and Seconds will change accordingly .

note

“Active” and “Testing” options will affect your Error Budget (decrease your budget) while “Development” will not.

When the SLO kicks off, it will (currently) connect to the selected Data source and start digesting data for the previous 28 day time window to measure against the SLO(s) that are activated.

note

Be aware that this may take some time to "crunch the numbers" once it has started.

When it is done, Blameless will generate a chart of the data it has digested, based on the parameters set and the Error Budget policy values, and display it below the SLO list on the landing page.

SLO Results window with charts

Deleting SLOs

note

You can delete an SLO even if it is associated with an SLI or Error Budget Policy.

  1. If you have an SLO to delete: Click on the three dots icon in the SLO Title header.
  2. Select the Delete (trash can) icon.
  3. Blameless will confirm (and warn you) the planned deletion.
SLO Deletion warning
  1. Click the “Delete” button to confirm.

Managing SLIs

Once you have SLIs set up, you connect them to your SLOs, which are targets against your SLI. These indicators are points on a digital user journey that contribute to customer experience and satisfaction.

Services

Services is the list of SLIs associated with SLOs.

Adding Services

  1. Select the Service Level Indicators option. When the Services window opens, and a list of services, if created, will appear. If no Services exist, the SLO Manager will say so.
note

If there is no SLI associated with a service under the Services Title, the SLI title in the field will be blank.

  1. Click on “+ New Service” to create a new Service.
note

You must create at least one service with at least one SLI prior to adding an SLO to a User Journey.

  1. If there is no pre-existing Service: Blameless will report none exist.
New Services windowNew SLI window

Otherwise, a new modal opens containing the following required (*) fields:

  • Service Name
  • Description
  1. Enter the name for the new service and a description.
  2. Click the “Save” button. The new service will appear on the “Services” landing screen the next time you open it. The resulting SLI is empty until you define it.
note

The SLI list will remain blank until you create an SLI and save it.

Services Details

When you open the desired Services window, you will find the following elements:

  • SLI A list of SLIs (if any exist) under an SLI tab.

  • Notes A Notes tab containing a text field where you can add information regarding the service.

  • Service Summary A summary of the following information regarding the SLI.

    • Service Description

    • Creation date

    • Last updated

    • Team

note

Both the Description and the Team (members) sections have a pencil icon, signifying these fields can be edited.

New Services window

Update Services

  1. Select an existing Service from the Services List.
  2. Click on the ellipse (three dots) and select the Edit” option. The Edit Service Modal opens.
  3. Adjust the information as desired in either (or both) required (*) fields.
  4. Click the “Save” button to update the Service.

Deleting Services

  1. Return to the Services window.
  2. Click on the ellipse (three dots) at the end of the row for the desired Service.
  3. Select the “Delete” option. Blameless will ask for confirmation to delete.
  4. Click on the “Delete” button.

SLIs

SLIs are a quantitative measure, typically provided through your APM platform. Traditionally, these refer to either latency or availability, which are defined as response times, including queue/wait time, in milliseconds. A collection of SLIs, or composite SLIs, are a group of SLIs attributed to a larger SLO.

Adding SLIs to a Service

Adding an SLI to a Service is done via the Service Level Indicators landing page.

However, creating an SLI through the SLO wizard is an additional flexibility we offer, because you must select an SLI when you create and add an SLO to a User Journey.

SLI Latency Configuration window
  1. Click on the “Define SLI” button. This will open a new SLI Details window.

  2. Assign an SLI Name (*=required) and enter a description (optional).

    For example: "This SLI measures the latency of the login request for the 95th percentile of login requests hitting the API and Login service".

SLI Latency Configuration window
  1. Select the SLI Type (*=required field). Currently supported options are:
  • Availability measures good metrics vs. valid metrics.
  • Latency measures how long it takes to complete the task.
  • Throughput measures the proportion of the time the data processing rate is faster than a threshold.
  • Saturation measures the proportion of the time your system load is less than a threshold.
  1. Select the Data source (*=required field), based on the integration(s) you activated.
  2. Copy and paste the metric shown in the example field, based on the Data source selected.
  3. Click the “Save” button. Return to the User Journey level. You can now start to set up SLOs with the Error Budget Policies.
note

Pingdom is a special integration, we can't measure internal metrics of services (which we can measure with Prometheus/Datadog/etc), we can work only with high level entities: page load time (SLI type "Latency") and status code of the page (SLI type "Availability").

note

Prometheus currently tracks a block of data based on the oldest available data received.

note

The SLI status is currently reported in two different areas:

  • SLI cards under Service Level Indicators > Service
  • In the Detailed view of each SLI.

Editing SLIs

note

You CANNOT edit an SLI if it is associated with an SLO. Blameless will throw an error message.

SLI Yes Deletion warning
  1. If there is no association, however: Select the SLO to update.
  2. Select an existing SLI form a list of Services.
  3. Click the “Next” button.
  4. Complete your edits.
  5. Click the “Save” button.

Deleting SLIs

note

You cannot delete an SLI if it is associated with an SLO(s).

  1. Open the SLI in question within the Services List.
  2. Locate the Trash can icon within the SLI Title header.
  3. Click on the Trash can. Blameless will confirm your choice with either a success warning or a decline warning window depending on the association status of the SLI.
  4. Select your desired action.
SLI No Deletion warningSLI Yes Deletion warning

For More Information

For instructions regarding the creation, configuration, and use of User Journeys, Error Budgets, SLOs, and SLIs, refer to the following SLO references:

Blameless SLO Definitions

An Introductory Guide to Blameless SLOs

A Guide to Getting started with Blameless SLOs

A Guide to Building a New SLO

A Guide to Error Budget Policies

A Guide to Managing Blameless SLOs (this document)

A Guide to Understanding your SLOs

Refer to the Google SRE Handbook for more information regarding Site Reliability Engineering.