The SSL Outage Nobody Saw Coming: Why Certificate Expiry Still Takes Down Production

SSL certificate monitoring

In 2020, Microsoft Teams went down for hours because of an expired SSL certificate. In 2021, Google Voice had the same problem. Slack, Spotify, LinkedIn — the list goes on. These are companies with thousands of engineers, and they still get bitten by something as predictable as a certificate expiration date.

If it can happen to them, it can happen to you.

Why SSL certificates still expire unexpectedly

On paper, certificate management is a solved problem. Let's Encrypt made free certificates mainstream. ACME clients like Certbot handle automatic renewal. Most hosting platforms bundle SSL by default.

So why do expiry outages keep happening?

Auto-renewal is not a guarantee

Auto-renewal depends on a chain of things going right:

The ACME client is still running and configured correctly
DNS validation records haven't changed
The web server can respond to HTTP-01 challenges
The renewal job isn't silently failing
Your payment method hasn't expired (for paid certificates)

If any link in that chain breaks, the renewal fails. And most auto-renewal systems fail silently — no error notification, no alert, no Slack message. The certificate just doesn't get renewed, and nobody knows until browsers start showing security warnings.

Certificate sprawl is real

Modern infrastructure means certificates everywhere. Your main domain. Staging environments. Internal APIs. Microservice-to-microservice TLS. Third-party integrations. Wildcard certificates that cover dozens of subdomains.

A mid-size SaaS company can easily have 20 to 50 certificates in play. Tracking them manually in a spreadsheet is a recipe for exactly the kind of outage you're trying to prevent.

The blast radius is bigger than downtime

When an SSL certificate expires, the damage goes beyond a few minutes of downtime:

Browsers block access entirely. Modern browsers don't just warn users — they actively prevent them from reaching your site. There's no "proceed anyway" for HSTS-enabled domains.
API integrations break. If your API serves an expired certificate, every client that validates certificates (which is all of them, hopefully) will reject the connection.
Trust erodes. Users see a security warning and wonder if your site has been compromised. Some will never come back.
SEO takes a hit. Search engines penalize sites with certificate errors, and it can take weeks to recover rankings.

What proactive SSL monitoring looks like

The fix isn't complicated, but it does require moving from reactive to proactive. Instead of waiting for an outage and then realizing it was the certificate, you monitor the certificate itself.

Expiry countdown alerts

The most basic and most valuable check. A good monitoring tool tells you how many days are left on every certificate you track, and sends alerts at useful intervals — 30 days, 14 days, 7 days, 3 days, 1 day.

Thirty days is enough time to debug a broken auto-renewal process, contact your CA, or manually renew. One day is a fire drill.

Chain validation

A certificate can be valid and unexpired but still cause errors if the intermediate certificates are wrong. This is surprisingly common after renewals — the leaf certificate gets updated but the server is still serving an old or incomplete chain.

Monitoring should validate the full chain from leaf to root, not just the expiry date.

Hostname verification

Wildcard certificates, multi-domain SANs, and certificate reissuance can all lead to hostname mismatches. If your certificate doesn't cover the domain it's being served on, browsers reject it.

Revocation detection

Certificates can be revoked by the issuing CA for various reasons — key compromise, domain ownership disputes, or CA policy violations. A revoked certificate is functionally expired, even if the date says otherwise.

Building SSL monitoring into your workflow

The goal is to make certificate expiry impossible to miss without adding another dashboard to check.

Add every domain you care about. Not just production — staging, internal tools, and API endpoints all need certificates that work.
Route alerts to your existing channels. SSL alerts should land in the same Slack channel or email thread as your uptime alerts. Don't create a separate workflow.
Treat the first alert as a task, not a notification. When you get a 30-day warning, create a ticket. Investigate the auto-renewal setup. Verify the renewal will succeed before it needs to.
Check after every renewal. Automated renewal happened? Great. Verify the new certificate is valid, the chain is correct, and the hostname matches. Trust but verify.

The bottom line

SSL certificate expiry is one of the most preventable causes of production outages. The dates are known in advance. The checks are straightforward. The fix is usually a single command or a click in a dashboard.

The only reason it still causes outages is that teams assume auto-renewal will handle it and don't verify. Add monitoring, route the alerts where your team already works, and treat the warnings seriously.

It's a solved problem — but only if you actually solve it.