[{"data":1,"prerenderedAt":781},["ShallowReactive",2],{"\u002Fblog\u002Fincident-communication-templates":3},{"id":4,"title":5,"author":6,"body":8,"category":769,"date":770,"description":771,"extension":772,"image":773,"lastUpdated":773,"meta":774,"navigation":775,"path":776,"readingTime":777,"seo":778,"stem":779,"__hash__":780},"blog\u002Fblog\u002Fincident-communication-templates.md","Incident Communication Templates: Status Page Updates, Customer Emails, and Slack Announcements",{"name":7},"Vantaj Team",{"type":9,"value":10,"toc":739},"minimark",[11,15,18,27,30,35,38,41,44,46,50,53,58,61,84,87,89,93,96,120,123,125,129,132,161,164,166,170,173,197,200,202,206,209,284,286,290,293,297,306,364,366,370,382,472,475,477,481,484,501,562,564,568,572,584,594,597,599,603,606,612,615,617,621,626,632,634,638,641,690,694,697,700,704,708,711,715,718,722,725,729,732],[12,13,14],"p",{},"When something breaks in production, communication is a skill separate from debugging. Most engineers are good at one and unprepared for the other. The fix is having templates ready before the incident happens: when your checkout is down at 11 PM, you should not be staring at a blank draft wondering what to say.",[12,16,17],{},"This post contains copy-ready templates for every communication touchpoint in a production incident: status page updates, customer emails, and internal Slack messages.",[12,19,20,21,26],{},"For the postmortem template that comes after the incident, see ",[22,23,25],"a",{"href":24},"\u002Fblog\u002Fincident-postmortem-template","How to Write an Incident Postmortem",".",[28,29],"hr",{},[31,32,34],"h2",{"id":33},"principle-update-before-you-know-the-cause","Principle: Update Before You Know the Cause",[12,36,37],{},"The instinct during an incident is to wait until you understand what happened before communicating. This instinct is wrong.",[12,39,40],{},"Customers and stakeholders who see nothing for 20 minutes assume the worst: that you don't know, that you don't care, or that you're hiding something. A status page that says \"Investigating\" within 3 minutes of an incident starting communicates that your team is on it, even with no additional information.",[12,42,43],{},"Post first. Investigate simultaneously.",[28,45],{},[31,47,49],{"id":48},"status-page-update-templates","Status Page Update Templates",[12,51,52],{},"Use these in order as the incident progresses.",[54,55,57],"h3",{"id":56},"stage-1-investigating","Stage 1: Investigating",[12,59,60],{},"Post this within 5 minutes of detecting an issue, before you know the cause.",[62,63,64,74,81],"blockquote",{},[12,65,66],{},[67,68,69,73],"strong",{},[70,71,72],"span",{},"Service Name"," — Investigating",[12,75,76,77,80],{},"We are investigating reports of ",[70,78,79],{},"service or feature"," being unavailable. Engineers are looking into the issue now.",[12,82,83],{},"Next update in 15 minutes.",[12,85,86],{},"Commit to the next-update time and keep it. An update that says \"still investigating, no new information\" is better than silence past your stated window.",[28,88],{},[54,90,92],{"id":91},"stage-2-identified","Stage 2: Identified",[12,94,95],{},"Post this when you know what's wrong, even if you haven't fixed it yet.",[62,97,98,105,111,117],{},[12,99,100],{},[67,101,102,104],{},[70,103,72],{}," — Issue Identified",[12,106,107,108,26],{},"We have identified the cause: ",[70,109,110],{},"brief description, e.g., \"a database configuration change deployed at 14:32 UTC is causing elevated error rates on the checkout API\"",[12,112,113,114,26],{},"We are working on a fix. Affected users may experience ",[70,115,116],{},"specific impact — e.g., \"errors when attempting to complete purchases\"",[12,118,119],{},"Next update in 20 minutes.",[12,121,122],{},"Be specific about the cause. \"A database configuration change\" is better than \"an internal issue.\" Customers understand that systems are complex. What erodes trust is vagueness, not technical explanations.",[28,124],{},[54,126,128],{"id":127},"stage-3-fix-in-progress","Stage 3: Fix in Progress",[12,130,131],{},"Post this when a fix is actively being deployed.",[62,133,134,141,147,158],{},[12,135,136],{},[67,137,138,140],{},[70,139,72],{}," — Fix in Progress",[12,142,143,144,26],{},"We are deploying a fix for the identified issue. We expect service to be fully restored within ",[70,145,146],{},"time estimate — be conservative, add 50% to your internal estimate",[12,148,149,150,153,154,157],{},"Current status: ",[70,151,152],{},"affected features"," remain impacted. ",[70,155,156],{},"Any unaffected features"," are operating normally.",[12,159,160],{},"Next update in 10 minutes or when the issue is resolved.",[12,162,163],{},"If you're not confident in the timeline, say \"within the next 30–60 minutes\" rather than committing to a time you'll miss.",[28,165],{},[54,167,169],{"id":168},"stage-4-monitoring","Stage 4: Monitoring",[12,171,172],{},"Post this after deploying the fix, before you're confident in full recovery.",[62,174,175,182,185,194],{},[12,176,177],{},[67,178,179,181],{},[70,180,72],{}," — Monitoring",[12,183,184],{},"The fix has been deployed. We are monitoring to confirm full recovery.",[12,186,187,190,191,26],{},[70,188,189],{},"Feature\u002Fservice"," should now be functioning normally for most users. If you continue to experience issues, contact us at ",[70,192,193],{},"support email",[12,195,196],{},"We will post a final update once we have confirmed full recovery.",[12,198,199],{},"Don't skip this stage to jump straight to Resolved. A second failure immediately after declaring resolved is worse than staying in Monitoring longer.",[28,201],{},[54,203,205],{"id":204},"stage-5-resolved","Stage 5: Resolved",[12,207,208],{},"Post this only when recovery is confirmed stable, not the moment the fix is deployed.",[62,210,211,218,228,233,277],{},[12,212,213],{},[67,214,215,217],{},[70,216,72],{}," — Resolved",[12,219,220,221,223,224,227],{},"This incident has been resolved. ",[70,222,189],{}," is fully operational as of ",[70,225,226],{},"time"," UTC.",[12,229,230],{},[67,231,232],{},"Incident summary:",[234,235,236,246,253,261,269],"ul",{},[237,238,239,242,243,245],"li",{},[67,240,241],{},"Started:"," ",[70,244,226],{}," UTC",[237,247,248,242,251,245],{},[67,249,250],{},"Resolved:",[70,252,226],{},[237,254,255,242,258],{},[67,256,257],{},"Duration:",[70,259,260],{},"X hours Y minutes",[237,262,263,242,266],{},[67,264,265],{},"Impact:",[70,267,268],{},"Who was affected and how — be specific",[237,270,271,242,274],{},[67,272,273],{},"Cause:",[70,275,276],{},"One honest sentence",[12,278,279,280,283],{},"We will publish a full post-incident review within ",[70,281,282],{},"24\u002F48\u002F72 hours",". We apologize for the disruption.",[28,285],{},[31,287,289],{"id":288},"customer-email-templates","Customer Email Templates",[12,291,292],{},"Send customer emails for P1 incidents (all users affected) and significant P2 incidents. For minor or short outages, the status page update is sufficient.",[54,294,296],{"id":295},"short-outage-under-30-minutes","Short Outage — Under 30 Minutes",[12,298,299,302,303,217],{},[67,300,301],{},"Subject:"," Brief service disruption on ",[70,304,305],{},"Date",[62,307,308,315,337,344,350,353],{},[12,309,310,311,314],{},"Hi ",[70,312,313],{},"Name",",",[12,316,317,318,320,321,324,325,328,329,332,333,336],{},"We experienced a brief disruption to ",[70,319,79],{}," on ",[70,322,323],{},"date"," between ",[70,326,327],{},"start time"," and ",[70,330,331],{},"end time"," UTC (",[70,334,335],{},"X"," minutes total).",[12,338,339,340,343],{},"During this window, ",[70,341,342],{},"specific impact — e.g., \"users attempting to log in may have received error messages\"",". The issue is resolved and no action is needed on your end.",[12,345,346,347,26],{},"We have identified the cause and have taken steps to prevent recurrence. A full summary is available on our status page: ",[70,348,349],{},"link",[12,351,352],{},"We're sorry for the disruption.",[12,354,355,357,360,361],{},[70,356,313],{},[70,358,359],{},"Title",", ",[70,362,363],{},"Company",[28,365],{},[54,367,369],{"id":368},"major-outage-over-1-hour-or-broad-impact","Major Outage — Over 1 Hour or Broad Impact",[12,371,372,374,375,377,378,381],{},[67,373,301],{}," Service outage on ",[70,376,305],{}," — ",[70,379,380],{},"Duration"," — What happened and what we're doing",[62,383,384,388,410,415,420,425,430,435,440,445,450,455,461,464],{},[12,385,310,386,314],{},[70,387,313],{},[12,389,390,391,360,393,396,397,400,401,403,404,406,407,409],{},"On ",[70,392,323],{},[70,394,395],{},"product name"," experienced an outage affecting ",[70,398,399],{},"specific services or features"," from ",[70,402,327],{}," to ",[70,405,331],{}," UTC — ",[70,408,260],{}," total.",[12,411,412],{},[67,413,414],{},"What happened",[12,416,417],{},[70,418,419],{},"2–3 sentences explaining the cause honestly. Be specific. \"A misconfiguration in our database connection pooler caused connections to exhaust under normal load\" is better than \"an infrastructure issue.\" Customers understand complex systems; what they don't forgive is vagueness.",[12,421,422],{},[67,423,424],{},"Who was affected",[12,426,427],{},[70,428,429],{},"Describe scope — all users, users on certain plans, users in certain regions, etc.",[12,431,432],{},[67,433,434],{},"What we've done",[12,436,437],{},[70,438,439],{},"List 2–4 concrete steps already completed — not planned, completed.",[12,441,442],{},[67,443,444],{},"What we're doing to prevent recurrence",[12,446,447],{},[70,448,449],{},"List 2–4 specific changes being implemented. \"We have added automated alerting for connection pool saturation\" is better than \"we are improving our monitoring.\"",[12,451,452,453,26],{},"A full post-incident review is available here: ",[70,454,349],{},[12,456,457,458,460],{},"We recognize that your team depends on ",[70,459,395],{}," and that this outage had real consequences. We are sorry.",[12,462,463],{},"If you have questions, reply to this email directly.",[12,465,466,469],{},[70,467,468],{},"Founder or CEO name",[70,470,471],{},"Company Name",[12,473,474],{},"Two notes on this template: send it from the founder or CEO, not a generic support address. The reply-to should be a monitored inbox; customers who reply after a major outage are often your most engaged users, and ignoring replies compounds the trust damage.",[28,476],{},[54,478,480],{"id":479},"planned-maintenance-notice","Planned Maintenance Notice",[12,482,483],{},"Send 72+ hours before a planned maintenance window.",[12,485,486,488,489,377,491,494,495,406,498],{},[67,487,301],{}," Scheduled maintenance — ",[70,490,305],{},[70,492,493],{},"Start time","–",[70,496,497],{},"End time",[70,499,500],{},"Expected impact",[62,502,503,507,523,531,539,545,551,554],{},[12,504,310,505,314],{},[70,506,313],{},[12,508,509,510,320,513,400,515,403,517,332,519,522],{},"We have scheduled maintenance for ",[70,511,512],{},"product or feature",[70,514,323],{},[70,516,327],{},[70,518,331],{},[70,520,521],{},"X hours",").",[12,524,525,242,528],{},[67,526,527],{},"Expected impact:",[70,529,530],{},"Be specific — e.g., \"The API will be unavailable. The dashboard will be read-only. No data will be lost.\"",[12,532,533,242,536],{},[67,534,535],{},"Reason:",[70,537,538],{},"Brief explanation — e.g., \"We are migrating our database to a new provider to improve performance and reliability.\"",[12,540,541,542,544],{},"If this window conflicts with a critical workflow, contact us at ",[70,543,193],{}," and we will work with you on a solution.",[12,546,547,548,550],{},"We will update our status page at ",[70,549,349],{}," throughout the maintenance window.",[12,552,553],{},"Thank you for your patience.",[12,555,556,558,360,560],{},[70,557,313],{},[70,559,359],{},[70,561,363],{},[28,563],{},[31,565,567],{"id":566},"internal-slack-teams-templates","Internal Slack \u002F Teams Templates",[54,569,571],{"id":570},"initial-incident-announcement","Initial Incident Announcement",[12,573,574,575,579,580,583],{},"Post to ",[576,577,578],"code",{},"#incidents"," or ",[576,581,582],{},"#engineering"," when the incident is confirmed.",[585,586,591],"pre",{"className":587,"code":589,"language":590},[588],"language-text","🔴 INCIDENT OPEN\n\nService: [service name]\nImpact: [brief description]\nSeverity: P1 \u002F P2 \u002F P3\nIncident Commander: @name\nStarted: [time] UTC\n\nStatus page: [link]\nIncident channel: #inc-[date]-[short-description]\n\nAll incident discussion in #inc-[date]-[short-description] only.\n","text",[576,592,589],{"__ignoreMap":593},"",[12,595,596],{},"Create a dedicated incident channel immediately. Keeping all technical discussion out of the main engineering channel makes it easier to follow the thread, run a timeline afterward, and include or exclude people appropriately.",[28,598],{},[54,600,602],{"id":601},"status-update-while-active","Status Update While Active",[12,604,605],{},"Post to the incident channel every 15 minutes.",[585,607,610],{"className":608,"code":609,"language":590},[588],"📍 UPDATE — [time] UTC\n\nStatus: [Investigating \u002F Identified \u002F Fix in Progress \u002F Monitoring]\n[1–2 sentences on current state and what's being tried]\nNext update: [time] UTC\n",[576,611,609],{"__ignoreMap":593},[12,613,614],{},"Post even when there's nothing new. \"Still investigating, no change\" is a valid update. Silence causes teammates and stakeholders to wonder if the incident is being actively worked.",[28,616],{},[54,618,620],{"id":619},"resolution-announcement","Resolution Announcement",[12,622,574,623,625],{},[576,624,578],{}," when the incident is closed.",[585,627,630],{"className":628,"code":629,"language":590},[588],"✅ RESOLVED — [time] UTC\n\nService: [service name]\nDuration: [X hours Y minutes]\nRoot cause: [1 sentence]\nPostmortem: [link \u002F \"will be posted within 48 hours\"]\n\nThanks: @names who worked the incident\n",[576,631,629],{"__ignoreMap":593},[28,633],{},[31,635,637],{"id":636},"the-communication-checklist","The Communication Checklist",[12,639,640],{},"During any significant incident, run through this in order:",[642,643,644,650,656,661,667,673,679,684],"ol",{},[237,645,646,649],{},[67,647,648],{},"Status page — \"Investigating\""," within 5 minutes of detection",[237,651,652,655],{},[67,653,654],{},"Post to #incidents"," with severity and incident commander",[237,657,658],{},[67,659,660],{},"Open a dedicated incident channel",[237,662,663,666],{},[67,664,665],{},"Update status page every 15 minutes"," until resolved",[237,668,669,672],{},[67,670,671],{},"Status page — \"Resolved\""," after confirming stable recovery",[237,674,675,678],{},[67,676,677],{},"Customer email"," within 2 hours of resolution (P1 and major P2 only)",[237,680,681],{},[67,682,683],{},"Resolution posted to #incidents",[237,685,686,689],{},[67,687,688],{},"Postmortem scheduled"," within 48 hours",[31,691,693],{"id":692},"why-most-teams-get-this-wrong","Why Most Teams Get This Wrong",[12,695,696],{},"The most common communication failure during incidents is over-indexing on technical investigation at the expense of external updates. The engineering team knows work is happening; customers don't. Thirty minutes of silence while your checkout is down means hundreds of customers refreshing, opening support tickets, and tweeting. Five minutes to post a status update prevents most of that.",[12,698,699],{},"The second most common failure is underpromising specificity in the cause description. \"We experienced an internal issue\" tells customers nothing and signals either that you don't know what happened or that you're hiding it. A specific technical cause, even one most customers don't fully understand, signals honesty and competence. \"A database connection pool configuration change we deployed at 2:30 PM caused connection exhaustion under normal traffic load\" is better in every dimension.",[31,701,703],{"id":702},"frequently-asked-questions","Frequently Asked Questions",[54,705,707],{"id":706},"when-should-i-send-a-customer-email-vs-relying-on-the-status-page","When should I send a customer email vs. relying on the status page?",[12,709,710],{},"Send a customer email for any incident lasting over 30 minutes with broad user impact (P1), or any incident lasting over 1 hour regardless of scope. For short outages and minor partial degradations, updating your status page is sufficient. Customers who subscribe to your status page will receive the update automatically.",[54,712,714],{"id":713},"should-i-send-the-customer-email-before-or-after-the-postmortem","Should I send the customer email before or after the postmortem?",[12,716,717],{},"Send the initial customer communication (using the major outage template above) within 2 hours of resolution. It doesn't need to include the full root cause analysis — an honest brief explanation and a commitment to publish the full review is enough. Publish the postmortem separately within 24–48 hours.",[54,719,721],{"id":720},"how-specific-should-i-be-about-the-technical-cause","How specific should I be about the technical cause?",[12,723,724],{},"More specific than you think. Customers and stakeholders trust teams that explain specifically what went wrong over teams that use generic language. You don't need to include stack traces or internal code details, but naming the system that failed and the type of failure builds credibility. Vagueness reads as either incompetence or concealment.",[54,726,728],{"id":727},"what-if-the-incident-is-still-ongoing-when-i-need-to-send-an-update","What if the incident is still ongoing when I need to send an update?",[12,730,731],{},"Use the Stage 2 (Identified) or Stage 3 (Fix in Progress) template on your status page, and hold the customer email until after resolution. Don't send a customer email while the incident is ongoing — you'll need to send another one after resolution, and two emails in quick succession creates confusion. The status page handles active incident communication; email handles post-incident communication.",[12,733,734,735,26],{},"For the full incident response process from alert to postmortem, see the ",[22,736,738],{"href":737},"\u002Fblog\u002Fon-call-survival-guide","on-call survival guide",{"title":593,"searchDepth":740,"depth":740,"links":741},2,[742,743,751,756,761,762,763],{"id":33,"depth":740,"text":34},{"id":48,"depth":740,"text":49,"children":744},[745,747,748,749,750],{"id":56,"depth":746,"text":57},3,{"id":91,"depth":746,"text":92},{"id":127,"depth":746,"text":128},{"id":168,"depth":746,"text":169},{"id":204,"depth":746,"text":205},{"id":288,"depth":740,"text":289,"children":752},[753,754,755],{"id":295,"depth":746,"text":296},{"id":368,"depth":746,"text":369},{"id":479,"depth":746,"text":480},{"id":566,"depth":740,"text":567,"children":757},[758,759,760],{"id":570,"depth":746,"text":571},{"id":601,"depth":746,"text":602},{"id":619,"depth":746,"text":620},{"id":636,"depth":740,"text":637},{"id":692,"depth":740,"text":693},{"id":702,"depth":740,"text":703,"children":764},[765,766,767,768],{"id":706,"depth":746,"text":707},{"id":713,"depth":746,"text":714},{"id":720,"depth":746,"text":721},{"id":727,"depth":746,"text":728},"tutorials","2026-06-26","Pre-written templates for every stage of a production incident: from the first 'Investigating' post to the resolved announcement and customer email. Copy, fill in the blanks, send.","md",null,{},true,"\u002Fblog\u002Fincident-communication-templates",9,{"title":5,"description":771},"blog\u002Fincident-communication-templates","Gl0urXsyG0dpNj1FsaH0TwRJ0xbJH93CpC8XE3dnX7Q",1782464113573]