Incident Communication Templates: Status Page Updates, Customer Emails, and Slack Announcements

When something breaks in production, communication is a skill separate from debugging. Most engineers are good at one and unprepared for the other. The fix is having templates ready before the incident happens: when your checkout is down at 11 PM, you should not be staring at a blank draft wondering what to say.

This post contains copy-ready templates for every communication touchpoint in a production incident: status page updates, customer emails, and internal Slack messages.

For the postmortem template that comes after the incident, see How to Write an Incident Postmortem.

Principle: Update Before You Know the Cause

The instinct during an incident is to wait until you understand what happened before communicating. This instinct is wrong.

Customers and stakeholders who see nothing for 20 minutes assume the worst: that you don't know, that you don't care, or that you're hiding something. A status page that says "Investigating" within 3 minutes of an incident starting communicates that your team is on it, even with no additional information.

Post first. Investigate simultaneously.

Status Page Update Templates

Use these in order as the incident progresses.

Stage 1: Investigating

Post this within 5 minutes of detecting an issue, before you know the cause.

Service Name — Investigating
We are investigating reports of service or feature being unavailable. Engineers are looking into the issue now.
Next update in 15 minutes.

Commit to the next-update time and keep it. An update that says "still investigating, no new information" is better than silence past your stated window.

Stage 2: Identified

Post this when you know what's wrong, even if you haven't fixed it yet.

Service Name — Issue Identified
We have identified the cause: brief description, e.g., "a database configuration change deployed at 14:32 UTC is causing elevated error rates on the checkout API".
We are working on a fix. Affected users may experience specific impact — e.g., "errors when attempting to complete purchases".
Next update in 20 minutes.

Be specific about the cause. "A database configuration change" is better than "an internal issue." Customers understand that systems are complex. What erodes trust is vagueness, not technical explanations.

Stage 3: Fix in Progress

Post this when a fix is actively being deployed.

Service Name — Fix in Progress
We are deploying a fix for the identified issue. We expect service to be fully restored within time estimate — be conservative, add 50% to your internal estimate.
Current status: affected features remain impacted. Any unaffected features are operating normally.
Next update in 10 minutes or when the issue is resolved.

If you're not confident in the timeline, say "within the next 30–60 minutes" rather than committing to a time you'll miss.

Stage 4: Monitoring

Post this after deploying the fix, before you're confident in full recovery.

Service Name — Monitoring
The fix has been deployed. We are monitoring to confirm full recovery.
Feature/service should now be functioning normally for most users. If you continue to experience issues, contact us at support email.
We will post a final update once we have confirmed full recovery.

Don't skip this stage to jump straight to Resolved. A second failure immediately after declaring resolved is worse than staying in Monitoring longer.

Stage 5: Resolved

Post this only when recovery is confirmed stable, not the moment the fix is deployed.

Service Name — Resolved
This incident has been resolved. Feature/service is fully operational as of time UTC.
Incident summary:
Started: time UTC
Resolved: time UTC
Duration: X hours Y minutes
Impact: Who was affected and how — be specific
Cause: One honest sentence
We will publish a full post-incident review within 24/48/72 hours. We apologize for the disruption.

Customer Email Templates

Send customer emails for P1 incidents (all users affected) and significant P2 incidents. For minor or short outages, the status page update is sufficient.

Short Outage — Under 30 Minutes

Subject: Brief service disruption on Date — Resolved

Hi Name,
We experienced a brief disruption to service or feature on date between start time and end time UTC (X minutes total).
During this window, specific impact — e.g., "users attempting to log in may have received error messages". The issue is resolved and no action is needed on your end.
We have identified the cause and have taken steps to prevent recurrence. A full summary is available on our status page: link.
We're sorry for the disruption.
NameTitle, Company

Major Outage — Over 1 Hour or Broad Impact

Subject: Service outage on Date — Duration — What happened and what we're doing

Hi Name,
On date, product name experienced an outage affecting specific services or features from start time to end time UTC — X hours Y minutes total.
What happened
2–3 sentences explaining the cause honestly. Be specific. "A misconfiguration in our database connection pooler caused connections to exhaust under normal load" is better than "an infrastructure issue." Customers understand complex systems; what they don't forgive is vagueness.
Who was affected
Describe scope — all users, users on certain plans, users in certain regions, etc.
What we've done
List 2–4 concrete steps already completed — not planned, completed.
What we're doing to prevent recurrence
List 2–4 specific changes being implemented. "We have added automated alerting for connection pool saturation" is better than "we are improving our monitoring."
A full post-incident review is available here: link.
We recognize that your team depends on product name and that this outage had real consequences. We are sorry.
If you have questions, reply to this email directly.
Founder or CEO nameCompany Name

Two notes on this template: send it from the founder or CEO, not a generic support address. The reply-to should be a monitored inbox; customers who reply after a major outage are often your most engaged users, and ignoring replies compounds the trust damage.

Planned Maintenance Notice

Send 72+ hours before a planned maintenance window.

Subject: Scheduled maintenance — Date — Start time–End time UTC — Expected impact

Hi Name,
We have scheduled maintenance for product or feature on date from start time to end time UTC (X hours).
Expected impact: Be specific — e.g., "The API will be unavailable. The dashboard will be read-only. No data will be lost."
Reason: Brief explanation — e.g., "We are migrating our database to a new provider to improve performance and reliability."
If this window conflicts with a critical workflow, contact us at support email and we will work with you on a solution.
We will update our status page at link throughout the maintenance window.
Thank you for your patience.
NameTitle, Company

Internal Slack / Teams Templates

Initial Incident Announcement

Post to #incidents or #engineering when the incident is confirmed.

🔴 INCIDENT OPEN

Service: [service name]
Impact: [brief description]
Severity: P1 / P2 / P3
Incident Commander: @name
Started: [time] UTC

Status page: [link]
Incident channel: #inc-[date]-[short-description]

All incident discussion in #inc-[date]-[short-description] only.

Create a dedicated incident channel immediately. Keeping all technical discussion out of the main engineering channel makes it easier to follow the thread, run a timeline afterward, and include or exclude people appropriately.

Status Update While Active

Post to the incident channel every 15 minutes.

📍 UPDATE — [time] UTC

Status: [Investigating / Identified / Fix in Progress / Monitoring]
[1–2 sentences on current state and what's being tried]
Next update: [time] UTC

Post even when there's nothing new. "Still investigating, no change" is a valid update. Silence causes teammates and stakeholders to wonder if the incident is being actively worked.

Resolution Announcement

Post to #incidents when the incident is closed.

✅ RESOLVED — [time] UTC

Service: [service name]
Duration: [X hours Y minutes]
Root cause: [1 sentence]
Postmortem: [link / "will be posted within 48 hours"]

Thanks: @names who worked the incident

The Communication Checklist

During any significant incident, run through this in order:

Status page — "Investigating" within 5 minutes of detection
Post to #incidents with severity and incident commander
Open a dedicated incident channel
Update status page every 15 minutes until resolved
Status page — "Resolved" after confirming stable recovery
Customer email within 2 hours of resolution (P1 and major P2 only)
Resolution posted to #incidents
Postmortem scheduled within 48 hours

Why Most Teams Get This Wrong

The most common communication failure during incidents is over-indexing on technical investigation at the expense of external updates. The engineering team knows work is happening; customers don't. Thirty minutes of silence while your checkout is down means hundreds of customers refreshing, opening support tickets, and tweeting. Five minutes to post a status update prevents most of that.

The second most common failure is underpromising specificity in the cause description. "We experienced an internal issue" tells customers nothing and signals either that you don't know what happened or that you're hiding it. A specific technical cause, even one most customers don't fully understand, signals honesty and competence. "A database connection pool configuration change we deployed at 2:30 PM caused connection exhaustion under normal traffic load" is better in every dimension.

Frequently Asked Questions

When should I send a customer email vs. relying on the status page?

Send a customer email for any incident lasting over 30 minutes with broad user impact (P1), or any incident lasting over 1 hour regardless of scope. For short outages and minor partial degradations, updating your status page is sufficient. Customers who subscribe to your status page will receive the update automatically.

Should I send the customer email before or after the postmortem?

Send the initial customer communication (using the major outage template above) within 2 hours of resolution. It doesn't need to include the full root cause analysis — an honest brief explanation and a commitment to publish the full review is enough. Publish the postmortem separately within 24–48 hours.

How specific should I be about the technical cause?

More specific than you think. Customers and stakeholders trust teams that explain specifically what went wrong over teams that use generic language. You don't need to include stack traces or internal code details, but naming the system that failed and the type of failure builds credibility. Vagueness reads as either incompetence or concealment.

What if the incident is still ongoing when I need to send an update?

Use the Stage 2 (Identified) or Stage 3 (Fix in Progress) template on your status page, and hold the customer email until after resolution. Don't send a customer email while the incident is ongoing — you'll need to send another one after resolution, and two emails in quick succession creates confusion. The status page handles active incident communication; email handles post-incident communication.

For the full incident response process from alert to postmortem, see the on-call survival guide.