Back to blog
Infrastructure

GitHub Outages in 2026: A Month-by-Month Analysis

GitHub experienced 25+ reported incidents between May and June 2026. This analysis breaks down the most significant outages by cause, duration, and impact - and identifies the patterns that keep recurring.

Vantaj Team · June 26, 2026 · 11 min read

GitHub is the world's largest code hosting platform, running services that 100 million developers depend on daily. When it goes down, CI/CD pipelines stall, deployments block, and teams lose access to code. Understanding when and why it fails - with real data, not vague status summaries - helps engineering teams build better contingency plans.

This analysis covers every public GitHub incident from May 27 through June 26, 2026, sourced directly from githubstatus.com. All durations, error rates, and root causes are taken from GitHub's own incident postmortems.


Incident Summary: May 27 – June 26, 2026

GitHub reported 25 incidents over this 30-day period. That averages to nearly one incident per calendar day - though most were narrow in scope (Copilot-specific or single-service), and several resolved in under 15 minutes.

DateIncidentDurationRoot Cause
May 27Git operations, PRs, Issues, API69 minAnalytics component CPU saturation (cascade)
May 28Multiple services elevated errors9 minPartial auth service deployment, rolled back
Jun 1OpenAI models disruptionNot detailedUpstream AI provider
Jun 1Some GitHub servicesNot detailedNot detailed
Jun 4Webhook APIs and UI degradedNot detailedNot detailed
Jun 5Auth/API (0.11% wrong 404s) + Slack/Teams70 minAuthorization component bug with user tokens
Jun 6EU region: Codeload and Package Registry43 minNetwork circuit migration disrupted EU PoP
Jun 8GitHub.com, REST API, GraphQL, Webhooks5-12 minTransient infrastructure capacity, self-resolved
Jun 8Copilot Code Review failingNot detailedNot detailed
Jun 11Webhooks delayed~160 minNot detailed in postmortem
Jun 12EU region disruptionLinked to Jun 6Network migration (same root cause)
Jun 12Code Scanning and Billing delaysNot detailedNot detailed
Jun 15Feature flag service failure (analytics)44 minFeature flag client transient error, no retry
Jun 16Pull Requests and Issues (signed-out)55 minUpstream model provider (Opus 4.8)
Jun 17Copilot availabilityNot detailedNot detailed
Jun 18Auth/API (9% sporadic 401s, +800ms latency)80 minmemcached misconfiguration during rollout
Jun 18Feature flags service elevated errorsLinked to Jun 15Same feature flag service issue
Jun 19Webhooks incidentNot detailedNot detailed
Jun 19Copilot next edit suggestionsNot detailedNot detailed
Jun 23Copilot next edit suggestions elevated errorsNot detailedNot detailed
Jun 24Some GitHub servicesNot detailedNot detailed
Jun 25Webhooks latency increasedNot detailedNot detailed
Jun 25Webhooks, PRs, Actions, Issues degradationResolved 18:27 UTCNot fully detailed

The Five Most Significant Incidents

1. May 27 - Git Operations Cascade (69 minutes)

Impact: 3.5% of HTTPS pushes failed. 0.2% of SSH pushes failed. Pull Requests, Issues, GraphQL API degraded.

Root cause: An internal analytics component generated unexpectedly high load, saturating CPU on the underlying infrastructure. Services that depended on Git operations began failing as a cascade.

Resolution: GitHub stopped the offending analytics component. Services recovered shortly after.

What went wrong: An internal background system - not directly user-facing - created enough load to degrade core user-facing services. The analytics component lacked resource limits or circuit breakers that would have contained its impact.

GitHub noted in the postmortem: "We are taking steps to add resource limits and kill switches."


2. May 28 - Partial Deployment Triggers Multi-Service Errors (9 minutes)

Impact: 10% of GitHub Actions runs failed to queue or encountered errors. Web experience, REST API, and Git operations all affected.

Root cause: A change partially deployed to an authentication service caused dependent services to fail. The partial rollout state - neither the old version nor the new one fully applied - was the failure mode.

Resolution: GitHub rolled back the change. Recovery was fast because the rollback was straightforward.

What went wrong: The deployment validation process didn't catch that a partial deployment would produce an inconsistent state that downstream services couldn't handle.

GitHub noted: "We are expanding test coverage and improving our deployment validation process."

This is a common pattern in large distributed systems: safe to deploy fully, unsafe to deploy partially.


3. June 5 - Authorization Bug Deletes Slack/Teams Subscriptions (70 minutes)

Impact: 0.11% of authenticated REST API requests returned incorrect "not found" responses. 12% of organizations with active Slack and Teams channel subscriptions had some subscriptions removed. 2% of all channel subscriptions deleted.

Root cause: A change to an internal authorization component introduced a bug that failed to correctly resolve user-to-server token access for organization-owned repositories. The Slack and Teams integrations interpreted the transient "not found" responses as permanent loss of access and deleted the subscriptions.

Resolution: GitHub reverted the authorization component change.

What went wrong: The authorization bug itself was one failure. But the bigger failure mode was the integrations treating a transient error as permanent. When the API returned 404, the Slack integration assumed the repository was gone and removed the subscription - irreversibly. Recovering deleted subscriptions required users to manually re-add them.

This illustrates a dangerous API consumer pattern: treating any "not found" as permanent action-required, rather than distinguishing between transient and durable errors.


4. June 18 - memcached Misconfiguration Causes 9% Auth Failures (80 minutes)

Impact: ~9% of API requests returned sporadic 401 errors. ~800ms of additional latency on affected requests. Users experienced intermittent "logged out" behavior.

Root cause: A memcached proxy service rollout to GitHub's internal API infrastructure caused the authentication service to pick up an incorrect memcached host configuration. When authentication lookups went to the wrong host, they failed - intermittently, not consistently, which made the issue harder to diagnose.

Resolution: GitHub deployed a configuration change to memcached to use the correct host.

What went wrong: Configuration changes to infrastructure components that authentication depends on require validation before rollout. A canary deployment or pre-rollout config verification step would have caught the incorrect host before production traffic hit it.

GitHub noted plans: "We plan to migrate our authentication system to prevent similar issues."

At 80 minutes, this was the longest duration incident in the period covered by detailed postmortems.


5. June 6 - EU Network Migration Disrupts Package Registry (43 minutes)

Impact: 0.95% average Codeload error rate. 9.2% average Package Registry error rate. Peak Package Registry errors reached 27%. Affected users whose traffic routed through European infrastructure.

Root cause: A planned network circuit migration disrupted connectivity at one of GitHub's European Points of Presence. The traffic-shifting process "did not operate as expected," leaving some production traffic routed through the affected site.

Resolution: Traffic shifted away from the affected PoP.

What went wrong: Planned maintenance caused an unplanned outage. The traffic-shifting procedure had a failure mode that the team hadn't fully anticipated. Package Registry errors hit 27% at peak - significant for teams doing package installs in CI pipelines routed through EU infrastructure.


Recurring Failure Patterns

Across the 25 incidents in this period, four patterns account for most of the impact.

Pattern 1: Webhooks (5 incidents)

Webhooks degraded or failed on June 4, June 11, June 19, and June 25 (twice). No single postmortem in this dataset explains what causes GitHub's webhook delivery to fail repeatedly. The frequency suggests either fragile infrastructure or a shared dependency that's hit by multiple different upstream issues.

For teams that depend on webhooks for CI/CD triggers, deployment notifications, or workflow automations, GitHub webhook failures are a significant operational risk. Having a secondary delivery mechanism or monitoring for missed webhook events is worth the investment.

Pattern 2: Copilot AI Services (6 incidents)

Copilot-specific incidents appeared on June 1, June 8, June 17, June 19, June 23, and affected June 16's model disruption. GitHub Copilot depends on external AI model providers (OpenAI, Anthropic), which introduces a dependency layer outside GitHub's direct control.

These incidents are largely independent of core GitHub services. If Copilot completions fail, PRs and Issues continue working normally. But for teams where Copilot is integrated into developer workflows, the frequency of AI model disruptions is notable.

Pattern 3: Deployment-Triggered Failures

Two of the five detailed incidents trace directly to a deployment or rollout: the May 28 partial authentication deployment and the June 18 memcached rollout.

Both could have been caught earlier with stricter pre-deployment validation. Both resolved quickly once identified. Both caused disproportionate impact relative to the change being made - the May 28 incident affected 10% of Actions runs from a single configuration change.

Pattern 4: Auth and API Instability

The June 5 authorization bug and June 18 memcached issue both affected authentication. Auth is a foundational dependency - when it degrades intermittently, every service that requires authentication sees errors. The 80-minute duration of June 18 and the subscription deletion side effect of June 5 make these the highest-impact incident types in this dataset.


Incident Frequency by Affected Service

ServiceIncidents (May 27 – Jun 26)
Webhooks5
Copilot / AI features6
API / Auth4
Core GitHub services (PRs, Issues, Git)3
EU / Regional2
Other (Code Scanning, Billing)2

Uptime Estimates

GitHub doesn't publish an overall uptime percentage on their status page. Based on the detailed postmortem durations available:

IncidentDuration
May 27 Git cascade69 min
May 28 Auth deployment9 min
Jun 5 Auth/API/Slack70 min
Jun 6 EU network43 min
Jun 8 GitHub.com/API5-12 min
Jun 11 Webhooks~160 min
Jun 15 Feature flags44 min
Jun 18 Auth/API memcached80 min
Total (documented)~500 min over 30 days

500 minutes of documented degradation over 30 days (43,200 minutes) represents roughly 98.8% availability for the services specifically affected during those windows - not accounting for the many incidents without detailed duration data.

This aligns with GitHub's informal track record of 99.x% availability, with occasional multi-hour events and frequent short-lived degradations.


What This Means for Teams That Depend on GitHub

Don't build pipelines with a single webhook trigger. Webhooks are GitHub's most unreliable service based on this dataset - five incidents in one month. If a missed webhook blocks a deployment or notification, build a polling fallback.

Model AI feature dependency separately. Copilot, Code Review AI, and AI-powered features depend on upstream model providers that GitHub doesn't control. Design workflows that degrade gracefully when Copilot is unavailable.

Monitor your integration points. The June 5 incident deleted Slack/Teams subscriptions silently. If your GitHub Slack integration had stopped posting notifications, your team might not have noticed for hours. Monitor the output of your GitHub integrations, not just GitHub's status page.

Watch for EU-specific issues. Two incidents in this period specifically affected European infrastructure. If your team routes CI/CD through EU GitHub infrastructure, regional monitoring that checks from inside Europe gives earlier signal than a US-based check.

Watch the GitHub Status API. GitHub publishes machine-readable status at api.githubstatus.com/v2/summary.json. Monitor that endpoint programmatically or subscribe to status page notifications so you get the first alert, not the second-hand report from a developer who noticed their PR wasn't building.


All incident data sourced from githubstatus.com and GitHub's published postmortems. Durations and error rates are taken verbatim from GitHub's own incident reports. This analysis covers the 30-day window available in the public incident feed at time of writing (June 26, 2026).