Getting Started
Once you have the SLO components together, you need to look at managing your SLO environment using your identified Best Practices. To help support that effort, this guide describes the following actions:
-
Creating
-
Adding
-
Editing
-
Deleting
Note: It is important to check your associations within the SLO components as you will not be able to delete or edit some of these components if they are inter-associated.
To help you, as a new user, Blameless provides you with the SLO Wizard to help guide you through the process. You start with the User Journey.
Start by launching the SLO Manager. Blameless opens to the User Journey Landing page. Next, click on “+New Journey”. The SLO Wizard will walk you through the process, and you can follow that process via the guide icon at the top of the page or by clicking the on “Next” button.
Note: You can create a User Journey and leave it blank as a placeholder for future population.
You can continue to the section “Working with the SLO Wizard” for a high level description of the feature.
As an experienced user, you are probably familiar enough with the process to not need the SLO Wizard to create more SLIs, but it is certainly there for you to use to create new user journeys and add new SLOs to user journeys. You can continue on via the section, “Launching the SLO Manager”.
Working via the SLO Wizard
An SLO requires the following:
-
Create the User Journey
-
Create the SLI
-
Create the Error Budget Policy
-
Create the SLO
-
Set the Thresholds
Note: The best practice for User Journey analysis is collaboration across teams and groups to collect the journey information.
Managing SLIs
Once you have SLIs set up, you connect them to your SLOs, which are targets against your SLI. These indicators are points on a digital user journey that contribute to customer experience and satisfaction.
"Services" is the list of SLIs associated with SLOs.
-
Select the Service Level Indicators option. When the Services window opens, and a list of services, if created, will appear. If no Services exist, the SLO Manager will say so.
Note: If there is no SLI associated with a service under the Services Title, the SLI title in the field will be blank.
-
Click on “+ New Service” to create a new Service.
Note: You must create at least one service with at least one SLI prior to adding an SLO to a User Journey.
-
If there is no pre-existing Service: Blameless will report none exist.
Otherwise, a new modal opens containing the following required (*) fields:
-
Service Name
-
Description
-
Enter the name for the new service and a description.
-
Click the “Save” button. The new service will appear on the “Services” landing screen the next time you open it. The resulting SLI is empty until you define it.
Note: The SLI list will remain blank until you create an SLI and save it.
When you open the desired Services window, you will find the following elements:
-
SLI A list of SLIs (if any exist) under an SLI tab.
-
Notes A Notes tab containing a text field where you can add information regarding the service.
-
Service Summary A summary of the following information regarding the SLI.
-
Service Description
-
Creation date
-
Last updated
-
Team
-
Note: Both the Description and the Team (members) sections have a pencil icon, signifying these fields can be edited.
-
Select an existing Service from the Services List.
-
Click on the ellipsis (three dots) and select the "Edit" option. The Edit Service Modal opens.
-
Adjust the information as desired in either (or both) required (*) fields.
-
Click the “Save” button to update the Service.
-
Return to the Services window.
-
Click on the ellipsis (three dots) at the end of the row for the desired Service.
-
Select the “Delete” option. Blameless will ask for confirmation to delete.
-
Click on the “Delete” button.
SLIs are a quantitative measure, typically provided through your APM platform. Traditionally, these refer to either latency or availability, which are defined as response times, including queue/wait time, in milliseconds. A collection of SLIs, or composite SLIs, are a group of SLIs attributed to a larger SLO.
Adding an SLI to a Service is done via the Service Level Indicators landing page.
However, creating an SLI through the SLO wizard is an additional flexibility we offer, because you must select an SLI when you create and add an SLO to a User Journey.
-
Click on the “Define SLI” button. This will open a new SLI Details window.
-
Assign an SLI Name (*=required) and enter a description (optional).
For example:"This SLI measures the latency of the login request for the 95th percentile of login requests hitting the API and Login service".
-
Select the SLI Type (*=required field). Currently supported options are:
-
Availability measures good metrics vs. valid metrics.
-
Latency measures how long it takes to complete the task.
-
Through put measures the proportion of the time the data processing rate is faster than a threshold.
-
Saturation measures the proportion of the time your system load is less than a threshold.
-
Select the Data source (*=required field), based on the integration(s) you activated.
-
Copy and paste the metric shown in the example field, based on the Data source selected.
-
Click the “Save” button. Return to the User Journey level. You can now start to set up SLOs with the Error Budget Policies.
Note: Pingdom is a special integration, we can't measure internal metrics of services (which we can measure with Prometheus/Datadog/etc), we can work only with high level entities: page load time (SLI type "Latency") and status code of the page (SLI type "Availability").
Note: Prometheus currently tracks a block of data based on the oldest available data received.
Note: The SLI status is currently reported in two different areas:
- SLI cards under Service Level Indicators > Service
- In the Detailed view of each SLI.
-
If there is no association, however: Select the SLO to update.
-
Select an existing SLI form a list of Services.
-
Click the “Next” button.
-
Complete your edits.
-
Click the “Save” button.
Note: You cannot delete an SLI if it is associated with an SLO(s).
-
Open the SLI in question within the Services List.
-
Locate the Trash can icon within the SLI Title header.
-
Click on the Trash can. Blameless will confirm your choice with either a success warning or a decline warning window depending on the association status of the SLI.
-
Select your desired action.
SLI Ingest Log
Blameless reports the backfilling status on SLI via a dynamic status icon (backfilling, error, backfilled), however, this SLI status is ephemeral, with some information currently provided via a tooltip when users hover over that status icon (round).
Connections between Blameless and the remote monitoring tool could fail at any time, potentially resulting in data missing in the historical graphs, however Blameless always attempts to backfill the data from the time of the failure. Other issues originating from the data source that could also result in missing data include:
-
The monitoring tool might not be able to transmit data or stopped collected metrics data
-
The metrics used to feed the SLI might be deleted from the source.
To troubleshoot the origin of potential missing data and provide confidence that the data is valid, Blameless reports the history of those data transfer issues and successes by displaying an Error Log.
When reviewing the Service's group of SLIs, Blameless displays a Notes tab for the associated SLIs. When you select a specific SLI, the SLI Detail View is opened and the Notes tab is replaced with a Log tab.
The Error Log contains a table of data, defined below, regarding the state of the information, its status, a timestamp, and the timestamp of the first and last data entry captured from the data source.
Column name |
Description (type, format, values) |
JSON property |
Sort by |
---|---|---|---|
Timestamp |
Format: 2021-04-23T12:34:45.173-07:00 with hover showing human readable date/time |
createdAt |
Time |
Ingest phase |
Two possible values: Backfill or Regular |
||
Backfill is the initial phase when creating the SLI the first time |
type |
Alphabetical |
|
Status |
value can be one the 3 strings: inProgress, Error or Success |
status |
Alphabetical |
Start Range |
Human readable data/time format (use Blameless standard) |
start |
Time |
End Range |
Human readable data/time format (use Blameless standard) |
end |
Time |
Message |
Long string |
errorMessage |
Alphabetical |
Comments
0 comments
Article is closed for comments.