What is a maintenance window in monitoring?

A maintenance window is a scheduled period when planned downtime or degradation is expected. Monitoring tools suppress alerts during this window so engineers are not paged for expected failures. The window is announced on your status page so customers know the downtime is planned.

How far in advance should you announce a maintenance window?

48 hours minimum for routine maintenance. 1 to 2 weeks for maintenance that affects availability for more than 2 hours or that affects enterprise customers with SLAs. Emergency maintenance should be announced as soon as the decision is made, even if that is 30 minutes before.

Should maintenance windows count against SLA uptime?

Only if your SLA does not explicitly exclude planned maintenance. Most SLAs include a carve-out for maintenance windows provided they are announced in advance with sufficient notice. Check your SLA language and confirm with your legal team before committing maintenance exclusions to customers.

What happens if maintenance runs longer than the window?

Extend the monitoring suppression window immediately to avoid false alerts. Update your status page to inform customers of the delay and give a new estimated completion time. Do not let the original window expire while work is still in progress.

How to Set Up Maintenance Windows: Planning, Execution, and Monitoring (Complete Guide)

Maintenance windows fail in two predictable ways: the monitoring tool generates false alerts because nobody suppressed it, or customers are surprised because nobody told them. Both are avoidable with the same process - plan, communicate, configure, execute, verify.

This guide covers each step in sequence, including the specific mistakes that turn routine maintenance into incidents.

Step 1: Define scope, duration, and rollback criteria

Before touching anything operational, write down three things:

Scope: Every service component, API endpoint, database, or dependency that will be unavailable or degraded during the window. Be specific. "The app" is not a scope definition. "The API will be unavailable; the dashboard will return a maintenance page; background job processing will be paused" is.

Duration: Your honest estimate, multiplied by 1.5. If you think the database migration takes 45 minutes, schedule a 70-minute window. Compression goes against you in maintenance: things take longer than expected, they rarely finish early. A window that ends 10 minutes early is fine. A window that runs 20 minutes over your published time generates customer anxiety and potentially SLA questions.

Rollback criteria: The specific conditions under which you abort the maintenance and restore service. Define this before you start, not during a crisis. Example: "If the migration has not completed within 80% of the scheduled window, we roll back and reschedule." Engineers under pressure make poor rollback decisions without pre-defined criteria.

Step 2: Choose your maintenance window timing

The wrong timing turns routine maintenance into a customer trust event.

Rules for timing selection:

Criterion	Guidance
Traffic volume	Pick your lowest-traffic hour. Check your analytics for day-of-week and time-of-day patterns.
Customer time zones	If customers are global, "low traffic" means different things in each zone. Check where your highest-value users are.
Team availability	Someone must be watching during the window. Avoid times when your on-call team's capacity is low.
Dependency schedules	Check whether your cloud provider, CDN, or database has its own maintenance that could overlap.
Release freeze periods	Avoid scheduled maintenance within 48 hours of a major product release.

Practical timing for B2B SaaS: Tuesday through Thursday, 2 AM to 5 AM in your primary customer time zone. Most B2B customers have low usage overnight on weekdays. Weekends feel safer but often have less team coverage.

Practical timing for consumer apps: Sunday through Monday, 3 AM to 6 AM. Consumer traffic peaks on weekday evenings and weekends.

Step 3: Configure monitoring suppression before the window

This is the most commonly skipped step. If you start maintenance without suppressing monitoring, every affected endpoint fires alerts. Your on-call engineer gets paged for failures they already knew were coming. That alert noise trains teams to ignore on-call notifications - the exact behavior that leads to missed real incidents.

How to configure maintenance windows in uptime monitoring tools:

Most monitoring tools support scheduled maintenance windows that suppress alerts during a defined time range. Configure this before the window starts, not during it.

In Vantaj (and most managed monitoring tools):

Go to maintenance settings
Set the start time and end time matching your published window
Select the affected monitors
Choose whether to suppress all alerts or only downtime alerts
Save - alerts will not fire during the window

Critical detail: Set the monitoring suppression window slightly wider than your announced maintenance window - 15 minutes of buffer on each side. This accounts for starts that run a few minutes late and recoveries that need time to stabilize before monitoring resumes.

See maintenance windows monitoring guide for tool-specific configuration steps.

Step 4: Announce to customers - timing and channel strategy

Customer communication is not optional for maintenance that affects availability. The question is how much lead time and which channels.

Minimum notice periods by maintenance type:

Maintenance type	Minimum notice	Recommended notice
Routine update, under 15 min	24 hours	48 hours
Significant change, 15–60 min	48 hours	1 week
Major migration, over 1 hour	1 week	2 weeks
Emergency maintenance	ASAP - even 30 min notice is better than none	-

Channels by customer tier:

Channel	When to use
Status page scheduled maintenance	Every planned window, without exception
Email to all paid users	Windows over 30 minutes, any time during business hours
Email to enterprise accounts	All windows affecting services in their contract
In-app banner	Active users who will encounter the maintenance page
Direct account manager contact	Enterprise accounts with uptime SLAs

What the announcement should say:

What is being maintained (specific, not vague)
Start time in the customer's local time zone (or UTC with conversion note)
Expected duration
What customers should expect during the window (error page, full unavailability, partial degradation)
Contact for urgent questions

For a copy-ready template, see incident communication templates - the maintenance announcement format applies directly.

Step 5: Execute the window with real-time status updates

Once the window starts, customers need to know it is in progress. A status page that shows "Scheduled Maintenance" and goes silent for 90 minutes creates anxiety.

Update cadence during the window:

Window duration	Update frequency
Under 15 minutes	Start + completion
15–60 minutes	Start + 30-minute check-in + completion
Over 60 minutes	Start + every 30 minutes + completion

What updates should say:

Maintenance is in progress (confirmation)
Current step or phase if multi-stage
Whether progress is on schedule or running behind
Revised completion estimate if behind

If the window runs over:

Immediately extend monitoring suppression (before the original window expires)
Update the status page: "Maintenance is taking longer than expected. New estimated completion: time. We will update every 15 minutes."
Notify enterprise accounts directly if they have SLA-sensitive services affected
Do not let the original window expire without updating - customers watching the status page for "Maintenance Complete" will start filing tickets

Step 6: Post-window verification before lifting suppression

The most dangerous moment in a maintenance window is right after the work completes. Engineers declare success, re-enable monitoring, and discover that something is still broken - now triggering real alerts rather than the suppressed ones.

Pre-lift checklist:

Health check endpoint returns expected response on all affected services
Database connections established and query latency within normal range
External dependencies confirmed reachable (third-party APIs, payment processors)
Application logs showing normal request flow, no elevated error rate
One full monitoring check cycle completed successfully before lifting suppression

Run this checklist before marking the window complete, not after. Lifting suppression when services are still recovering generates a wave of alerts that is difficult to distinguish from a new incident.

Step 7: Completion notice and post-window communication

When verification is complete:

Status page update: Mark maintenance complete with the actual end time. If it ran over schedule, acknowledge it: "Maintenance completed at actual time, approximately X minutes later than scheduled."

Subscriber notification: Most status page tools send this automatically when you mark maintenance complete.

Enterprise follow-up: For any window that ran significantly over schedule or caused customer-visible issues beyond the planned scope, send a personal follow-up from the account team within 24 hours.

Internal postmortem (for significant windows): If the maintenance revealed unexpected complexity, caused unplanned downtime, or required rollback, run a brief postmortem. See how to write an incident postmortem for the format.

Maintenance window mistakes that create incidents

Not suppressing monitoring. The most common mistake. Every affected endpoint fires alerts throughout the window. On-call gets paged for expected failures. Alert trust degrades.

Suppressing monitoring too broadly. Suppressing all monitors site-wide during a database migration means a real application error during the window goes undetected. Suppress only the monitors affected by the specific maintenance.

No rollback criteria. Engineers extend migrations past the window end time hoping they will finish, rather than rolling back per plan. The window overruns, customers notice, and what was scheduled maintenance becomes an unplanned incident.

Announcing too late. A maintenance announcement posted 2 hours before a 3 AM window reaches enterprise customers who have automated workflows running at that time. They have no time to adjust.

No post-window verification. Declaring the window complete while services are still recovering. The first monitor check fires a real alert. On-call has to investigate whether it is a residual maintenance issue or a new problem.

Forgetting to notify subscribers after completion. Customers who subscribed to status page notifications to track the maintenance window get no completion notice. They check the status page manually hours later and wonder what happened.

Using maintenance windows for SLA purposes

If your SLA excludes planned maintenance (most do), document every window correctly:

Record the announcement timestamp and channel
Record the actual start and end time
Note any deviation from the announced scope

This documentation protects you if a customer later claims the downtime during a maintenance window should count against their SLA credit. Without records showing the window was announced in advance, the claim is harder to dispute.

For SLA tracking infrastructure, see uptime SLA monitoring and SLA vs SLO vs SLI.