March 2026 AI Service Reliability Report

Inaugural Issue — This is AIWatch’s first monthly reliability report, establishing the baseline for AI service stability tracking across 27 providers.

Source: ai-watch.dev — Real-time AI service status monitoring Period: March 20–31, 2026 Published: April 2026 Services monitored: 27 — 20 API services, 4 coding agents, 3 AI apps

This report analyzes AI service reliability, uptime, incidents, and recovery time across 27 major providers including OpenAI, Anthropic (Claude), Google (Gemini), Amazon (Bedrock), and others — to help developers make informed infrastructure decisions. As the inaugural issue, it sets the benchmark for ongoing monthly comparisons.

March 2026 showed a clear reliability divide: Cohere and Hugging Face recorded perfect scores with zero incidents, while Anthropic services accumulated the highest incident counts due to per-model component reporting. ElevenLabs recorded the lowest uptime at 97.55%, and Deepgram experienced a 74-hour Voice Agent degradation triggered by an upstream OpenAI outage.

Daily Service Status

Recommendations — Which AI should I use?

Use Case	Recommended	Why
Production-critical	OpenAI API, Cohere	Only 2h 56m / zero downtime, highest stability
Low latency / cost	Groq Cloud, DeepSeek API	100% uptime, fast recovery
Coding workflows	Cursor, Windsurf	High uptime despite some affected days
Voice / audio	AssemblyAI (with fallback)	99.95% uptime; ElevenLabs and Deepgram had multi-hour outages
General purpose	Gemini, Perplexity	Good scores, but uptime not publicly disclosed — use with monitoring

TL;DR

Most reliable: Cohere, Hugging Face (100/100 — zero incidents, near-perfect uptime)
Best balance (stability + ecosystem): OpenAI API (88/100, only 2h 56m downtime, 99.99% uptime)
Riskiest this month: ElevenLabs (97.55% uptime, 8 affected days), Deepgram (74h single incident)
High incident noise: Anthropic services — counts inflated due to per-model component reporting
Watch out: GitHub Copilot infrastructure instability (18 affected days)

TL;DR in Korean

가장 안정적: Cohere, Hugging Face (100점 — 인시던트 0건, 완벽한 업타임)
안정성 + 생태계 균형: OpenAI API (88점, 총 다운타임 2시간 56분, 업타임 99.99%)
이번 달 가장 위험: ElevenLabs (업타임 97.55%, 8일 영향), Deepgram (74시간 단일 장애)
인시던트 수 주의: Anthropic 서비스는 모델별(Opus/Sonnet/Haiku) 개별 집계로 건수가 부풀려 보임
주의 필요: GitHub Copilot 인프라 불안정 (18일 영향)

Key Insight

March 2026 reveals three patterns worth noting:

High uptime ≠ low incidents: Anthropic maintained 99%+ uptime yet recorded the most incidents — driven by per-model component reporting (Opus/Sonnet/Haiku counted separately), not systemic instability.
Short incidents add up: Together AI had 20 incidents — the most of any service — but averaged just 25 minutes each. Total downtime (8h 37m) was less than a single Replicate outage (9h 38m).
Upstream dependencies matter: Deepgram’s longest incident (74h) was caused by an OpenAI outage affecting its Voice Agent downstream. Services built on other AI providers inherit their reliability risks.

Key Insight in Korean

높은 업타임 ≠ 적은 인시던트: Anthropic은 99%+ 업타임을 유지했지만 인시던트가 가장 많았습니다. 모델별(Opus/Sonnet/Haiku) 개별 리포팅 방식 때문이지, 시스템 전체 불안정이 아닙니다.
짧은 장애도 쌓인다: Together AI는 20건으로 최다 인시던트를 기록했지만 평균 25분이었습니다. 총 다운타임(8시간 37분)은 Replicate 단일 장애(9시간 38분)보다 적었습니다.
업스트림 의존성이 중요: Deepgram의 최장 장애(74시간)는 OpenAI 장애가 Voice Agent 하류에 영향을 준 것입니다. 다른 AI 위에 구축된 서비스는 해당 API의 장애 영향을 피할 수 없습니다.

AIWatch Score — March 2026 Reliability Rankings

AIWatch Score (0–100) is designed to answer one question:

“Which AI service is safest to rely on in production?”

Unlike raw uptime %, it incorporates incident frequency (how often things break), recovery time (how fast they fix it), and real downtime impact — making it a more realistic reliability signal for developers. All formulas are publicly documented. How it’s calculated →

Rank	Service	Score	Grade	Confidence	Why
1=	Cohere API	100	Excellent	High	Zero incidents, 100% uptime
1=	Hugging Face	100	Excellent	High	Zero incidents, 99.99% uptime
3	OpenRouter	99	Excellent	High	Zero incidents, 99.89% uptime
4	Groq Cloud	93	Excellent	High	100% uptime, 1 affected day
5	DeepSeek API	92	Excellent	High	100% uptime, single 1h incident
6=	Amazon Bedrock	90	Excellent	Medium	Zero incidents (added Mar 25, partial data)
6=	Stability AI	90	Excellent	Medium	Zero incidents (added Mar 25, partial data)
8	OpenAI API	88	Excellent	High	Only 2h 56m downtime all period
9	Pinecone	87	Excellent	High	99.98% uptime, 3 affected days
10=	Gemini API	86	Excellent	Medium	Zero incidents (uptime not published)
10=	Azure OpenAI	86	Excellent	Medium	Zero incidents (added Mar 25, uptime not published)
10=	Perplexity	86	Excellent	Medium	Zero incidents (uptime not published)
10=	xAI (Grok)	86	Excellent	Medium	Zero incidents (uptime not published)
14=	Together AI	84	Good	High	20 incidents but fast recovery (avg 25m)
14=	AssemblyAI	84	Good	High	Single 5h incident, 99.95% uptime
16	Character.AI	83	Good	High	4 minor incidents (avg 2m), 99.56% uptime
17	Windsurf	82	Good	High	Zero incidents in period, 3 affected days prior
18=	Mistral API	75	Good	Medium	7 incidents but very short (avg 6m)
18=	Cursor	75	Good	High	100% uptime despite 8 affected days
20	ChatGPT	73	Good	High	4 incidents including 19h file-related outage
21	Deepgram	70	Good	Medium	74h Voice Agent degradation (upstream OpenAI)
22	GitHub Copilot	67	Fair	High	18 affected days, infrastructure instability
23	Claude Code	61	Fair	High	Per-model reporting inflates count
24	Claude API	59	Fair	High	Per-model reporting inflates count
25	Replicate	57	Fair	Medium	Single 9h 38m outage
26	claude.ai	56	Fair	High	14 incidents, 21 affected days
27	ElevenLabs	47	Degrading	Medium	8 affected days, lowest uptime (97.55%)

Grade scale: Excellent (85+) · Good (70+) · Fair (55+) · Degrading (40+) · Unstable (<40)

AIWatch Score Rankings

Confidence reflects data completeness: High = full uptime + incident data available; Medium = uptime not published (industry average assumed) or partial monitoring period. Amazon Bedrock, Azure OpenAI, Stability AI were added March 25 — scores reflect 7 days of data with medium confidence. Anthropic services score lower due to per-model component reporting — each model tier counts separately toward affected days.

Incident Summary

Note on methodology: Incident counts and downtime reflect all affected components per service (e.g., Claude API counts Opus, Sonnet, and Haiku separately). Official uptime % is based on a single primary component. These two metrics are not directly comparable.

A higher incident count does not necessarily indicate lower reliability. Providers differ in reporting granularity — Anthropic reports per-model incidents (Opus/Sonnet/Haiku each counted separately), while others report at the service level. Direct comparisons should account for this difference.

One Claude API incident (“Elevated connection reset errors in Cowork”) was excluded — a Cowork-specific client issue (resolved by restarting Claude Desktop), not a Claude API outage.

Service	Inc	Downtime (longest)
Together AI	20	8h 37m (55m)
claude.ai	14	41h 43m (9h 47m)
Claude API	9	32h 30m (9h 47m)
Claude Code	9	32h 30m (9h 47m)
GitHub Copilot	8	13h 32m (6h 19m)
Mistral API	7	44m (22m)
Cursor	6	14h 45m (4h 4m)
Character.AI	4	11m (8m)
ChatGPT	4	36h 17m (19h 46m)
ElevenLabs	2	4h 57m (4h 47m)
Deepgram	2	74h 2m (74h 1m)
OpenAI API	1	2h 56m
DeepSeek API	1	1h 4m
AssemblyAI	1	5h 14m
Replicate	1	9h 38m

Zero incidents (12 services): Gemini API, Amazon Bedrock, Azure OpenAI, Cohere API, Groq Cloud, Perplexity, xAI (Grok), OpenRouter, Hugging Face, Pinecone, Stability AI, Windsurf

Official Uptime (Primary Component)

Azure OpenAI, Deepgram, Gemini, Mistral, Perplexity, and xAI do not publish accessible uptime metrics on their status pages.

Service	Uptime	Service	Uptime
Amazon Bedrock	100.00%	ChatGPT	99.99%
Cohere API	100.00%	Windsurf	99.99%
Groq Cloud	100.00%	Pinecone	99.98%
DeepSeek API	100.00%	AssemblyAI	99.95%
Stability AI	100.00%	OpenRouter	99.89%
Cursor	100.00%	GitHub Copilot	99.62%
OpenAI API	99.99%	Together AI	99.60%
Hugging Face	99.99%	Character.AI	99.56%
Claude Code	99.26%	Claude API	99.03%
claude.ai	98.88%	Replicate	98.61%
ElevenLabs	97.55%

Notable Incidents

1. Deepgram — 74-Hour Voice Agent Degradation (Mar 20–23)

Affected: Voice Agent API (downstream providers) Duration: 74h 1m

Deepgram’s Voice Agent service experienced a prolonged degradation caused by an upstream OpenAI outage. The incident highlighted the dependency risk of AI services built on third-party LLM providers. Deepgram’s core STT/TTS APIs were unaffected — only the Voice Agent component that routes through OpenAI was impacted.

2. Anthropic — Recurring Per-Model Incidents (Mar 20–31)

Affected: Claude API (9), claude.ai (14), Claude Code (9) Longest: 9h 47m (“Elevated error rates on Opus 4.6”)

Anthropic’s high incident count reflects its granular per-model reporting. Each model tier (Opus/Sonnet/Haiku) is tracked as a separate component, so a platform-wide degradation registers as multiple simultaneous incidents across claude.ai, Claude API, and Claude Code. The practical impact on any single model was lower than the aggregate numbers suggest.

Affected: Project files, file downloads Longest: 19h 46m (“Unable to download or preview project files”)

Two of ChatGPT’s four incidents were file-handling related (19h 46m + 6h 59m), affecting project file operations rather than core chat functionality. These are included in the count since ChatGPT is tracked as a consumer app, not just an API.

4. GitHub Copilot — 18 Affected Days

Affected: Copilot Chat, Webhooks, Codespaces, Actions Longest: 6h 19m

GitHub Copilot had the highest number of affected days (18) of any service. Disruptions spanned multiple infrastructure components including Webhooks, Codespaces, and Actions. While core AI completions were less impacted, developers relying on full GitHub integration experienced repeated interruptions.

5. Replicate — Single Long Outage (9h 38m)

Affected: Model inference API

A single 9h 38m outage — the longest single-service incident after Deepgram. No other incidents in the period, but the extended recovery time significantly impacted the score.

Observations

If you build on Anthropic

High incident count is mostly a reporting artifact (Opus/Sonnet/Haiku counted separately)
Monitor per-model components individually (e.g., claude-sonnet-4-5)
Longest single incident: 9h 47m — real disruption when it happens

If you build on GitHub Copilot

18 affected days — highest of any service
Webhooks and Codespaces disruptions are frequent
Avoid tight CI/CD dependency on these features without fallback handling

If you build on Deepgram

Voice Agent API depends on upstream LLM providers (OpenAI)
74h degradation was not a Deepgram infrastructure failure — it was an upstream dependency
Define multiple LLM providers for Voice Agent to mitigate

If you build on ElevenLabs

Lowest official uptime at 97.55%
8 affected days in the period
Must implement retry logic and cache generated audio for critical flows

Generally stable this month

OpenAI API (2h 56m total downtime), Groq Cloud (zero incidents), DeepSeek API (1h 4m) — good candidates for primary or fallback providers.

About This Report

Data Sources: Real-time data is aggregated from official status pages via multiple frameworks, including Atlassian Statuspage, incident.io, Google Cloud Status, Better Stack, Instatus, OnlineOrNot, and RSS feeds (Source: ai-watch.dev).
Monitoring Frequency: All 27 services are polled every 5 minutes via Cloudflare Workers. Health check probes measure direct API response times (RTT) at the same interval.
AIWatch Score (0–100): Calculated from three components — Uptime (50%), Incident frequency (30%), and Recovery speed (20%). Full methodology: ai-watch.dev/#about-score
Confidence Levels: High = official uptime + incident data available; Medium = uptime not published (industry average 99.5% assumed) or partial monitoring period. Confidence reflects data completeness, not service quality.
Incident Counting: Incident counts reflect all affected components per service. Providers differ in reporting granularity — Anthropic reports per-model incidents (Opus/Sonnet/Haiku each counted separately), while others report at the service level.
Uptime Metrics: Uptime percentages reflect official single-component figures provided by the status pages. Services marked with “—” do not provide a publicly accessible uptime metric.
Timezone Standard: All timestamps are recorded in UTC.
Data Coverage: AIWatch monitoring began March 20, 2026. This report covers March 20–31 (12 days). Official uptime figures from provider status pages may cover longer periods. Services added March 25 (Amazon Bedrock, Azure OpenAI, Stability AI) have 7 days of data.

Next report: April 2026

Live status — ai-watch.dev
Slack/Discord alerts — ai-watch.dev/#settings
Score methodology — ai-watch.dev/#about-score
All reports — reports.ai-watch.dev

Have feedback or spotted an error? Open an issue
Want us to track a service? Request here