Back to blog
Tutorials

What Is an SLA? Service Level Agreements Defined with Real-World Examples

An SLA is a contract that defines what uptime, response time, and support a provider commits to deliver. This guide breaks down every component, how SLA percentages translate to real downtime, and how to monitor whether you're actually meeting your commitments.

Theo Cummings · July 18, 2026 · 11 min read

A Service Level Agreement (SLA) is a formal contract between a service provider and a customer that defines the minimum performance standards the provider commits to deliver. SLAs specify uptime targets, support response times, and the financial remedies customers receive when providers fall short.

SLAs are how providers make availability promises concrete. Without one, a provider can claim their service is "highly available" with no obligation to define what that means. With one, 99.9% uptime is a measurable commitment backed by service credits.

Understanding what SLAs actually contain, how the percentages translate to real downtime, and how to verify compliance puts you in a fundamentally different position as a buyer or an engineering team building toward commitments of your own.

The five components of every SLA

Most SLAs contain the same five building blocks, even if the language varies by vendor.

1. Uptime commitment

The uptime percentage defines the proportion of time the service must be available. Most providers express this monthly rather than annually, which matters because a monthly measurement resets the clock every 30 days.

Common uptime tiers:

UptimeMonthly downtime allowedAnnual downtime allowed
99.0%7.3 hours3.65 days
99.5%3.65 hours43.8 hours
99.9%43.8 minutes8.77 hours
99.95%21.9 minutes4.38 hours
99.99%4.38 minutes52.6 minutes

See the complete uptime availability table for the full breakdown from 95% to 99.999%.

2. Measurement methodology

How a provider measures uptime matters as much as the target itself. The SLA should specify:

  • What counts as "downtime" (full outage, partial degradation, specific endpoints)
  • What is excluded from the uptime calculation (scheduled maintenance, customer-caused incidents, force majeure)
  • How the provider measures availability (their own synthetic checks, internal metrics, or a third party)

A provider who excludes all scheduled maintenance from their SLA calculation can schedule 4 hours of downtime every Sunday night and still claim 99.99% uptime for the remaining time. Read the methodology section before you trust the headline number.

3. Incident response and resolution targets

Many SLAs include support commitments alongside availability commitments:

  • First response time: How long until a support engineer acknowledges your ticket (common targets: 1 hour for critical, 4 hours for high, 8 hours for normal)
  • Resolution time: How long the provider aims to fix the issue
  • Escalation paths: What happens if first-tier support cannot resolve the incident

These commitments are often tiered by severity. A P1 incident (complete outage) gets faster response than a P3 (minor degradation).

4. Reporting and transparency

The SLA should specify how the provider reports on their performance. Look for:

  • Access to historical uptime data
  • Incident post-mortems for major outages
  • Status page availability at a URL you can bookmark
  • Export or API access to your uptime data

Providers who control all reporting create obvious conflicts of interest. Your own monitoring gives you independent data to verify claims.

5. Remedies for breach

When a provider misses their uptime commitment, the SLA defines what you get. Service credits are the industry standard.

A common credit structure:

Availability achievedCredit
99.0%–99.9%10% of monthly fee
95.0%–99.0%25% of monthly fee
Below 95.0%50% of monthly fee

Important limitations: credits expire, usually within 30-90 days of issuance. Most SLAs cap liability at the monthly fee paid, not your actual losses from the downtime. Credits are also usually not automatic; you have to request them.

Types of SLAs

External SLAs (customer-facing)

These are contracts with your paying customers. If you run a SaaS product and your pricing page says "99.9% uptime guaranteed," you have made an external SLA commitment. These carry legal and commercial weight.

External SLAs typically:

  • Are simpler than vendor SLAs (fewer exceptions and carve-outs)
  • Give customers credits for outages that affect them
  • Require your own uptime monitoring to be defensible

Internal SLAs (between teams)

Engineering teams use internal SLAs to define service commitments between platform teams and product teams. An infrastructure team might commit to 99.95% availability for their internal API gateway, while the product team commits to 99.9% for the customer-facing application that depends on it.

Internal SLAs formalize accountability without legal contracts. They also surface dependency risks: if your infrastructure team's SLA allows more downtime than your customer commitment, you have a structural problem.

Vendor SLAs (your dependencies)

Every service you depend on has its own SLA. Your monitoring tool, CDN, payment processor, database provider, and hosting platform all publish SLAs. These define the risk floor for your own uptime commitment.

If you commit to 99.99% availability to your customers but your database provider only commits to 99.9%, you are depending on a provider whose failure budget is larger than yours.

SLA vs SLO vs SLI

These three terms describe related concepts at different levels:

TermWhat it isWho it's for
SLAContract with commercial penaltiesCustomers, legal teams
SLOInternal engineering targetEngineering, product teams
SLIThe actual metric being measuredEngineers, monitoring systems

Your SLO is always more aggressive than your SLA. If your SLA commits to 99.9% uptime, your internal SLO might target 99.95%. That buffer is your recovery margin: when you breach the SLO, you have time to fix the problem before you breach the SLA and owe credits.

The SLI is the measurement itself: the percentage of successful HTTP requests, the ratio of healthy to total checks, the count of minutes with zero detected downtime.

Read SLA vs SLO vs SLI: Key Differences and Real Examples for a full breakdown of how these fit together in practice.

How SLA percentages work in practice

The jump from 99.9% to 99.99% sounds small. The operational difference is not.

99.9% uptime allows 8.77 hours of downtime per year. That is roughly two 4-hour maintenance windows with no margin for unplanned incidents.

99.99% uptime allows 52.6 minutes of downtime per year. A single incident that takes 45 minutes to diagnose and resolve consumes most of your annual budget.

99.999% allows 5.26 minutes. At this tier, you need redundant everything: multi-region deployments, automated failover, and incident response measured in seconds.

Most SaaS products target 99.9% because it is achievable with standard cloud infrastructure. Reaching 99.99% requires architectural investment that most early-stage products cannot justify.

What SLA monitoring looks like

You cannot monitor your own SLA compliance with your provider's status page. Providers control their status pages and historically underreport incidents. A 2023 study by Downdetector found that providers confirmed outages on their status pages an average of 22 minutes after users reported issues.

Independent monitoring means sending real checks from outside your network, on a schedule, from multiple regions.

For each service with an SLA commitment:

  1. Run checks every minute from at least 3 regions
  2. Record the timestamp, result, and response time of every check
  3. Calculate monthly availability as: (successful checks / total checks) × 100
  4. Store at least 13 months of data (current month plus prior 12 for year-over-year comparison)
  5. Export data at the end of each month before claiming credits

Vantaj runs checks from 10 global regions and requires failures from multiple regions before triggering an alert. This prevents false positives from single-probe network blips while ensuring you detect real outages fast. Your check history is exportable and serves as evidence when filing SLA credit claims.

Read SLA uptime monitoring: how to track and enforce your commitments for the full monitoring setup guide.

Common SLA negotiation points

If you are negotiating an enterprise SLA with a vendor, focus on these five terms:

Scheduled maintenance exclusions: Ask whether maintenance counts against uptime. Some providers exclude all maintenance; others count it. Negotiate for maintenance to count, or set maximums on maintenance frequency and duration.

Credit request windows: Most SLAs require you to request credits within 30 days of the incident. Negotiate for 60-90 days. You may not notice a borderline incident until you review your monitoring data at month end.

Automatic credits: Push for automatic credit issuance. Manual request processes create friction that results in unclaimed credits.

Severity tiers: A 99.9% SLA measured as an average can hide partial outages. Negotiate for full-outage credits to apply at a lower threshold than partial-degradation credits.

Termination rights: Long outages (24+ hours) should give you the right to terminate without penalty. Most standard SLAs do not include this; it requires negotiation.

SLA in the context of your monitoring stack

An SLA is only as good as your ability to verify it. Engineering teams that trust their providers' status pages are flying blind. The ones who maintain independent monitoring know:

  • Whether they are on track to meet their own customer commitments this month
  • Which provider dependencies are consuming their error budget fastest
  • Whether a credit request is justified by their data

Your monitoring setup needs to produce the audit trail that makes SLA management real, not aspirational.

See the uptime monitoring guide for the full stack setup, and the complete uptime availability table to understand exactly how much downtime each SLA tier allows.