Back to blog
Tutorials

How to Set Up Maintenance Windows: Planning, Execution, and Monitoring (Complete Guide)

A step-by-step guide for setting up maintenance windows that don't break your monitoring, don't surprise your customers, and don't generate false alerts. Covers planning, timing, customer communication, and post-window verification.

Theo Cummings · July 13, 2026 · 11 min read

Maintenance windows fail in two predictable ways: the monitoring tool generates false alerts because nobody suppressed it, or customers are surprised because nobody told them. Both are avoidable with the same process - plan, communicate, configure, execute, verify.

This guide covers each step in sequence, including the specific mistakes that turn routine maintenance into incidents.

Step 1: Define scope, duration, and rollback criteria

Before touching anything operational, write down three things:

Scope: Every service component, API endpoint, database, or dependency that will be unavailable or degraded during the window. Be specific. "The app" is not a scope definition. "The API will be unavailable; the dashboard will return a maintenance page; background job processing will be paused" is.

Duration: Your honest estimate, multiplied by 1.5. If you think the database migration takes 45 minutes, schedule a 70-minute window. Compression goes against you in maintenance: things take longer than expected, they rarely finish early. A window that ends 10 minutes early is fine. A window that runs 20 minutes over your published time generates customer anxiety and potentially SLA questions.

Rollback criteria: The specific conditions under which you abort the maintenance and restore service. Define this before you start, not during a crisis. Example: "If the migration has not completed within 80% of the scheduled window, we roll back and reschedule." Engineers under pressure make poor rollback decisions without pre-defined criteria.

Step 2: Choose your maintenance window timing

The wrong timing turns routine maintenance into a customer trust event.

Rules for timing selection:

CriterionGuidance
Traffic volumePick your lowest-traffic hour. Check your analytics for day-of-week and time-of-day patterns.
Customer time zonesIf customers are global, "low traffic" means different things in each zone. Check where your highest-value users are.
Team availabilitySomeone must be watching during the window. Avoid times when your on-call team's capacity is low.
Dependency schedulesCheck whether your cloud provider, CDN, or database has its own maintenance that could overlap.
Release freeze periodsAvoid scheduled maintenance within 48 hours of a major product release.

Practical timing for B2B SaaS: Tuesday through Thursday, 2 AM to 5 AM in your primary customer time zone. Most B2B customers have low usage overnight on weekdays. Weekends feel safer but often have less team coverage.

Practical timing for consumer apps: Sunday through Monday, 3 AM to 6 AM. Consumer traffic peaks on weekday evenings and weekends.

Step 3: Configure monitoring suppression before the window

This is the most commonly skipped step. If you start maintenance without suppressing monitoring, every affected endpoint fires alerts. Your on-call engineer gets paged for failures they already knew were coming. That alert noise trains teams to ignore on-call notifications - the exact behavior that leads to missed real incidents.

How to configure maintenance windows in uptime monitoring tools:

Most monitoring tools support scheduled maintenance windows that suppress alerts during a defined time range. Configure this before the window starts, not during it.

In Vantaj (and most managed monitoring tools):

  1. Go to maintenance settings
  2. Set the start time and end time matching your published window
  3. Select the affected monitors
  4. Choose whether to suppress all alerts or only downtime alerts
  5. Save - alerts will not fire during the window

Critical detail: Set the monitoring suppression window slightly wider than your announced maintenance window - 15 minutes of buffer on each side. This accounts for starts that run a few minutes late and recoveries that need time to stabilize before monitoring resumes.

See maintenance windows monitoring guide for tool-specific configuration steps.

Step 4: Announce to customers - timing and channel strategy

Customer communication is not optional for maintenance that affects availability. The question is how much lead time and which channels.

Minimum notice periods by maintenance type:

Maintenance typeMinimum noticeRecommended notice
Routine update, under 15 min24 hours48 hours
Significant change, 15–60 min48 hours1 week
Major migration, over 1 hour1 week2 weeks
Emergency maintenanceASAP - even 30 min notice is better than none-

Channels by customer tier:

ChannelWhen to use
Status page scheduled maintenanceEvery planned window, without exception
Email to all paid usersWindows over 30 minutes, any time during business hours
Email to enterprise accountsAll windows affecting services in their contract
In-app bannerActive users who will encounter the maintenance page
Direct account manager contactEnterprise accounts with uptime SLAs

What the announcement should say:

  • What is being maintained (specific, not vague)
  • Start time in the customer's local time zone (or UTC with conversion note)
  • Expected duration
  • What customers should expect during the window (error page, full unavailability, partial degradation)
  • Contact for urgent questions

For a copy-ready template, see incident communication templates - the maintenance announcement format applies directly.

Step 5: Execute the window with real-time status updates

Once the window starts, customers need to know it is in progress. A status page that shows "Scheduled Maintenance" and goes silent for 90 minutes creates anxiety.

Update cadence during the window:

Window durationUpdate frequency
Under 15 minutesStart + completion
15–60 minutesStart + 30-minute check-in + completion
Over 60 minutesStart + every 30 minutes + completion

What updates should say:

  • Maintenance is in progress (confirmation)
  • Current step or phase if multi-stage
  • Whether progress is on schedule or running behind
  • Revised completion estimate if behind

If the window runs over:

  1. Immediately extend monitoring suppression (before the original window expires)
  2. Update the status page: "Maintenance is taking longer than expected. New estimated completion: time. We will update every 15 minutes."
  3. Notify enterprise accounts directly if they have SLA-sensitive services affected
  4. Do not let the original window expire without updating - customers watching the status page for "Maintenance Complete" will start filing tickets

Step 6: Post-window verification before lifting suppression

The most dangerous moment in a maintenance window is right after the work completes. Engineers declare success, re-enable monitoring, and discover that something is still broken - now triggering real alerts rather than the suppressed ones.

Pre-lift checklist:

  • Health check endpoint returns expected response on all affected services
  • Database connections established and query latency within normal range
  • External dependencies confirmed reachable (third-party APIs, payment processors)
  • Application logs showing normal request flow, no elevated error rate
  • One full monitoring check cycle completed successfully before lifting suppression

Run this checklist before marking the window complete, not after. Lifting suppression when services are still recovering generates a wave of alerts that is difficult to distinguish from a new incident.

Step 7: Completion notice and post-window communication

When verification is complete:

Status page update: Mark maintenance complete with the actual end time. If it ran over schedule, acknowledge it: "Maintenance completed at actual time, approximately X minutes later than scheduled."

Subscriber notification: Most status page tools send this automatically when you mark maintenance complete.

Enterprise follow-up: For any window that ran significantly over schedule or caused customer-visible issues beyond the planned scope, send a personal follow-up from the account team within 24 hours.

Internal postmortem (for significant windows): If the maintenance revealed unexpected complexity, caused unplanned downtime, or required rollback, run a brief postmortem. See how to write an incident postmortem for the format.

Maintenance window mistakes that create incidents

Not suppressing monitoring. The most common mistake. Every affected endpoint fires alerts throughout the window. On-call gets paged for expected failures. Alert trust degrades.

Suppressing monitoring too broadly. Suppressing all monitors site-wide during a database migration means a real application error during the window goes undetected. Suppress only the monitors affected by the specific maintenance.

No rollback criteria. Engineers extend migrations past the window end time hoping they will finish, rather than rolling back per plan. The window overruns, customers notice, and what was scheduled maintenance becomes an unplanned incident.

Announcing too late. A maintenance announcement posted 2 hours before a 3 AM window reaches enterprise customers who have automated workflows running at that time. They have no time to adjust.

No post-window verification. Declaring the window complete while services are still recovering. The first monitor check fires a real alert. On-call has to investigate whether it is a residual maintenance issue or a new problem.

Forgetting to notify subscribers after completion. Customers who subscribed to status page notifications to track the maintenance window get no completion notice. They check the status page manually hours later and wonder what happened.

Using maintenance windows for SLA purposes

If your SLA excludes planned maintenance (most do), document every window correctly:

  • Record the announcement timestamp and channel
  • Record the actual start and end time
  • Note any deviation from the announced scope

This documentation protects you if a customer later claims the downtime during a maintenance window should count against their SLA credit. Without records showing the window was announced in advance, the claim is harder to dispute.

For SLA tracking infrastructure, see uptime SLA monitoring and SLA vs SLO vs SLI.