Inaugural Issue — This is AIWatch’s first monthly reliability report, establishing the baseline for AI service stability tracking across 27 providers.

Source: ai-watch.dev — Real-time AI service status monitoring Period: March 20–31, 2026 Published: April 2026 Services monitored: 27 — 20 API services, 4 coding agents, 3 AI apps

This report analyzes AI service reliability, uptime, incidents, and recovery time across 27 major providers including OpenAI, Anthropic (Claude), Google (Gemini), Amazon (Bedrock), and others — to help developers make informed infrastructure decisions. As the inaugural issue, it sets the benchmark for ongoing monthly comparisons.

March 2026 showed a clear reliability divide: Cohere and Hugging Face recorded perfect scores with zero incidents, while Anthropic services accumulated the highest incident counts due to per-model component reporting. ElevenLabs recorded the lowest uptime at 97.55%, and Deepgram experienced a 74-hour Voice Agent degradation triggered by an upstream OpenAI outage.

Daily Service Status


Recommendations — Which AI should I use?

Use Case Recommended Why
Production-critical OpenAI API, Cohere Only 2h 56m / zero downtime, highest stability
Low latency / cost Groq Cloud, DeepSeek API 100% uptime, fast recovery
Coding workflows Cursor, Windsurf High uptime despite some affected days
Voice / audio AssemblyAI (with fallback) 99.95% uptime; ElevenLabs and Deepgram had multi-hour outages
General purpose Gemini, Perplexity Good scores, but uptime not publicly disclosed — use with monitoring

TL;DR

  • Most reliable: Cohere, Hugging Face (100/100 — zero incidents, near-perfect uptime)
  • Best balance (stability + ecosystem): OpenAI API (88/100, only 2h 56m downtime, 99.99% uptime)
  • Riskiest this month: ElevenLabs (97.55% uptime, 8 affected days), Deepgram (74h single incident)
  • High incident noise: Anthropic services — counts inflated due to per-model component reporting
  • Watch out: GitHub Copilot infrastructure instability (18 affected days)
TL;DR in Korean
  • 가장 안정적: Cohere, Hugging Face (100점 — 인시던트 0건, 완벽한 업타임)
  • 안정성 + 생태계 균형: OpenAI API (88점, 총 다운타임 2시간 56분, 업타임 99.99%)
  • 이번 달 가장 위험: ElevenLabs (업타임 97.55%, 8일 영향), Deepgram (74시간 단일 장애)
  • 인시던트 수 주의: Anthropic 서비스는 모델별(Opus/Sonnet/Haiku) 개별 집계로 건수가 부풀려 보임
  • 주의 필요: GitHub Copilot 인프라 불안정 (18일 영향)

Key Insight

March 2026 reveals three patterns worth noting:

  • High uptime ≠ low incidents: Anthropic maintained 99%+ uptime yet recorded the most incidents — driven by per-model component reporting (Opus/Sonnet/Haiku counted separately), not systemic instability.
  • Short incidents add up: Together AI had 20 incidents — the most of any service — but averaged just 25 minutes each. Total downtime (8h 37m) was less than a single Replicate outage (9h 38m).
  • Upstream dependencies matter: Deepgram’s longest incident (74h) was caused by an OpenAI outage affecting its Voice Agent downstream. Services built on other AI providers inherit their reliability risks.
Key Insight in Korean
  • 높은 업타임 ≠ 적은 인시던트: Anthropic은 99%+ 업타임을 유지했지만 인시던트가 가장 많았습니다. 모델별(Opus/Sonnet/Haiku) 개별 리포팅 방식 때문이지, 시스템 전체 불안정이 아닙니다.
  • 짧은 장애도 쌓인다: Together AI는 20건으로 최다 인시던트를 기록했지만 평균 25분이었습니다. 총 다운타임(8시간 37분)은 Replicate 단일 장애(9시간 38분)보다 적었습니다.
  • 업스트림 의존성이 중요: Deepgram의 최장 장애(74시간)는 OpenAI 장애가 Voice Agent 하류에 영향을 준 것입니다. 다른 AI 위에 구축된 서비스는 해당 API의 장애 영향을 피할 수 없습니다.

AIWatch Score — March 2026 Reliability Rankings

AIWatch Score (0–100) is designed to answer one question:

“Which AI service is safest to rely on in production?”

Unlike raw uptime %, it incorporates incident frequency (how often things break), recovery time (how fast they fix it), and real downtime impact — making it a more realistic reliability signal for developers. All formulas are publicly documented. How it’s calculated →

Rank Service Score Grade Confidence Why
1= Cohere API 100 Excellent High Zero incidents, 100% uptime
1= Hugging Face 100 Excellent High Zero incidents, 99.99% uptime
3 OpenRouter 99 Excellent High Zero incidents, 99.89% uptime
4 Groq Cloud 93 Excellent High 100% uptime, 1 affected day
5 DeepSeek API 92 Excellent High 100% uptime, single 1h incident
6= Amazon Bedrock 90 Excellent Medium Zero incidents (added Mar 25, partial data)
6= Stability AI 90 Excellent Medium Zero incidents (added Mar 25, partial data)
8 OpenAI API 88 Excellent High Only 2h 56m downtime all period
9 Pinecone 87 Excellent High 99.98% uptime, 3 affected days
10= Gemini API 86 Excellent Medium Zero incidents (uptime not published)
10= Azure OpenAI 86 Excellent Medium Zero incidents (added Mar 25, uptime not published)
10= Perplexity 86 Excellent Medium Zero incidents (uptime not published)
10= xAI (Grok) 86 Excellent Medium Zero incidents (uptime not published)
14= Together AI 84 Good High 20 incidents but fast recovery (avg 25m)
14= AssemblyAI 84 Good High Single 5h incident, 99.95% uptime
16 Character.AI 83 Good High 4 minor incidents (avg 2m), 99.56% uptime
17 Windsurf 82 Good High Zero incidents in period, 3 affected days prior
18= Mistral API 75 Good Medium 7 incidents but very short (avg 6m)
18= Cursor 75 Good High 100% uptime despite 8 affected days
20 ChatGPT 73 Good High 4 incidents including 19h file-related outage
21 Deepgram 70 Good Medium 74h Voice Agent degradation (upstream OpenAI)
22 GitHub Copilot 67 Fair High 18 affected days, infrastructure instability
23 Claude Code 61 Fair High Per-model reporting inflates count
24 Claude API 59 Fair High Per-model reporting inflates count
25 Replicate 57 Fair Medium Single 9h 38m outage
26 claude.ai 56 Fair High 14 incidents, 21 affected days
27 ElevenLabs 47 Degrading Medium 8 affected days, lowest uptime (97.55%)

Grade scale: Excellent (85+) · Good (70+) · Fair (55+) · Degrading (40+) · Unstable (<40)

AIWatch Score Rankings

Confidence reflects data completeness: High = full uptime + incident data available; Medium = uptime not published (industry average assumed) or partial monitoring period. Amazon Bedrock, Azure OpenAI, Stability AI were added March 25 — scores reflect 7 days of data with medium confidence. Anthropic services score lower due to per-model component reporting — each model tier counts separately toward affected days.


Incident Summary

Note on methodology: Incident counts and downtime reflect all affected components per service (e.g., Claude API counts Opus, Sonnet, and Haiku separately). Official uptime % is based on a single primary component. These two metrics are not directly comparable.

A higher incident count does not necessarily indicate lower reliability. Providers differ in reporting granularity — Anthropic reports per-model incidents (Opus/Sonnet/Haiku each counted separately), while others report at the service level. Direct comparisons should account for this difference.

One Claude API incident (“Elevated connection reset errors in Cowork”) was excluded — a Cowork-specific client issue (resolved by restarting Claude Desktop), not a Claude API outage.

Service Inc Downtime (longest)
Together AI 20 8h 37m (55m)
claude.ai 14 41h 43m (9h 47m)
Claude API 9 32h 30m (9h 47m)
Claude Code 9 32h 30m (9h 47m)
GitHub Copilot 8 13h 32m (6h 19m)
Mistral API 7 44m (22m)
Cursor 6 14h 45m (4h 4m)
Character.AI 4 11m (8m)
ChatGPT 4 36h 17m (19h 46m)
ElevenLabs 2 4h 57m (4h 47m)
Deepgram 2 74h 2m (74h 1m)
OpenAI API 1 2h 56m
DeepSeek API 1 1h 4m
AssemblyAI 1 5h 14m
Replicate 1 9h 38m

Zero incidents (12 services): Gemini API, Amazon Bedrock, Azure OpenAI, Cohere API, Groq Cloud, Perplexity, xAI (Grok), OpenRouter, Hugging Face, Pinecone, Stability AI, Windsurf


Official Uptime (Primary Component)

Azure OpenAI, Deepgram, Gemini, Mistral, Perplexity, and xAI do not publish accessible uptime metrics on their status pages.

Service Uptime   Service Uptime
Amazon Bedrock 100.00%   ChatGPT 99.99%
Cohere API 100.00%   Windsurf 99.99%
Groq Cloud 100.00%   Pinecone 99.98%
DeepSeek API 100.00%   AssemblyAI 99.95%
Stability AI 100.00%   OpenRouter 99.89%
Cursor 100.00%   GitHub Copilot 99.62%
OpenAI API 99.99%   Together AI 99.60%
Hugging Face 99.99%   Character.AI 99.56%
Claude Code 99.26%   Claude API 99.03%
claude.ai 98.88%   Replicate 98.61%
ElevenLabs 97.55%      

Notable Incidents

1. Deepgram — 74-Hour Voice Agent Degradation (Mar 20–23)

Affected: Voice Agent API (downstream providers) Duration: 74h 1m

Deepgram’s Voice Agent service experienced a prolonged degradation caused by an upstream OpenAI outage. The incident highlighted the dependency risk of AI services built on third-party LLM providers. Deepgram’s core STT/TTS APIs were unaffected — only the Voice Agent component that routes through OpenAI was impacted.


2. Anthropic — Recurring Per-Model Incidents (Mar 20–31)

Affected: Claude API (9), claude.ai (14), Claude Code (9) Longest: 9h 47m (“Elevated error rates on Opus 4.6”)

Anthropic’s high incident count reflects its granular per-model reporting. Each model tier (Opus/Sonnet/Haiku) is tracked as a separate component, so a platform-wide degradation registers as multiple simultaneous incidents across claude.ai, Claude API, and Claude Code. The practical impact on any single model was lower than the aggregate numbers suggest.


Affected: Project files, file downloads Longest: 19h 46m (“Unable to download or preview project files”)

Two of ChatGPT’s four incidents were file-handling related (19h 46m + 6h 59m), affecting project file operations rather than core chat functionality. These are included in the count since ChatGPT is tracked as a consumer app, not just an API.


4. GitHub Copilot — 18 Affected Days

Affected: Copilot Chat, Webhooks, Codespaces, Actions Longest: 6h 19m

GitHub Copilot had the highest number of affected days (18) of any service. Disruptions spanned multiple infrastructure components including Webhooks, Codespaces, and Actions. While core AI completions were less impacted, developers relying on full GitHub integration experienced repeated interruptions.


5. Replicate — Single Long Outage (9h 38m)

Affected: Model inference API

A single 9h 38m outage — the longest single-service incident after Deepgram. No other incidents in the period, but the extended recovery time significantly impacted the score.


Observations

If you build on Anthropic

  • High incident count is mostly a reporting artifact (Opus/Sonnet/Haiku counted separately)
  • Monitor per-model components individually (e.g., claude-sonnet-4-5)
  • Longest single incident: 9h 47m — real disruption when it happens

If you build on GitHub Copilot

  • 18 affected days — highest of any service
  • Webhooks and Codespaces disruptions are frequent
  • Avoid tight CI/CD dependency on these features without fallback handling

If you build on Deepgram

  • Voice Agent API depends on upstream LLM providers (OpenAI)
  • 74h degradation was not a Deepgram infrastructure failure — it was an upstream dependency
  • Define multiple LLM providers for Voice Agent to mitigate

If you build on ElevenLabs

  • Lowest official uptime at 97.55%
  • 8 affected days in the period
  • Must implement retry logic and cache generated audio for critical flows

Generally stable this month

OpenAI API (2h 56m total downtime), Groq Cloud (zero incidents), DeepSeek API (1h 4m) — good candidates for primary or fallback providers.


About This Report

  • Data Sources: Real-time data is aggregated from official status pages via multiple frameworks, including Atlassian Statuspage, incident.io, Google Cloud Status, Better Stack, Instatus, OnlineOrNot, and RSS feeds (Source: ai-watch.dev).
  • Monitoring Frequency: All 27 services are polled every 5 minutes via Cloudflare Workers. Health check probes measure direct API response times (RTT) at the same interval.
  • AIWatch Score (0–100): Calculated from three components — Uptime (50%), Incident frequency (30%), and Recovery speed (20%). Full methodology: ai-watch.dev/#about-score
  • Confidence Levels: High = official uptime + incident data available; Medium = uptime not published (industry average 99.5% assumed) or partial monitoring period. Confidence reflects data completeness, not service quality.
  • Incident Counting: Incident counts reflect all affected components per service. Providers differ in reporting granularity — Anthropic reports per-model incidents (Opus/Sonnet/Haiku each counted separately), while others report at the service level.
  • Uptime Metrics: Uptime percentages reflect official single-component figures provided by the status pages. Services marked with “—” do not provide a publicly accessible uptime metric.
  • Timezone Standard: All timestamps are recorded in UTC.
  • Data Coverage: AIWatch monitoring began March 20, 2026. This report covers March 20–31 (12 days). Official uptime figures from provider status pages may cover longer periods. Services added March 25 (Amazon Bedrock, Azure OpenAI, Stability AI) have 7 days of data.

Next report: April 2026