API Monitoring: How to Monitor REST APIs, GraphQL Endpoints, and Webhooks

API monitoring is the practice of automatically checking whether an API endpoint is reachable, returning correct responses, and performing within acceptable latency thresholds. Unlike website monitoring, which checks if a page loads, API monitoring validates response structure, authentication, data correctness, and downstream integrations like webhooks.

A 200 status code does not mean your API is working. An endpoint can return 200 with an empty body, an error message, or missing fields that break every client calling it. Proper API monitoring checks what the response contains, not just whether it arrived.

What to Monitor on an API

Health Check Endpoints

Most well-structured APIs expose a dedicated health endpoint: /health, /status, /ping, or /healthz. This endpoint returns a machine-readable status of the API's critical dependencies.

A minimal health check response:

{
  "status": "ok",
  "version": "2.4.1",
  "database": "connected",
  "cache": "connected"
}

Monitor this endpoint and assert:

HTTP 200 status
"status": "ok" in the response body (keyword match)
Response time under your defined threshold (typically < 500ms)

If your API does not have a health endpoint, add one. It takes 15 minutes and transforms your monitoring from guesswork into precision.

Critical Endpoints by Function

Beyond the health check, monitor the endpoints that represent your API's core value:

Endpoint Category	What to Check	Why It Matters
Authentication	`/auth/token`, `/oauth/token`	Every user flow starts here; auth failures block everything
Core resource CRUD	`GET /users`, `GET /products`	Your most-called endpoints; any failure is user-facing
Search	`GET /search?q=test`	Often database-intensive; first to degrade under load
Webhooks	Outbound delivery confirmation	Silent failure mode; no user error, just missed events
Third-party integrations	Payment, email, data APIs	Failures cascade to users; you need to know before they do

Response Validation (Keyword Monitoring)

Status code checks are necessary but not sufficient. Configure keyword assertions that verify the response contains expected content:

Assert: response body contains "id"
Assert: response body contains "created_at"
Assert: response body does NOT contain "error"
Assert: response time < 800ms

A common failure: a database connection pool exhausts, and your API starts returning {"error": "connection pool exhausted"} with HTTP 200 because your error handler has a bug. A status code check sees this as healthy. A keyword check catches it.

Response Time Monitoring

Latency matters independently of availability. An API that responds in 8 seconds is technically "up" but functionally broken for most real-time use cases.

Track three metrics:

Average response time: baseline for normal behavior
P95 response time: what 95% of requests experience (more meaningful than average for user experience)
Response time trend: gradual increases often precede full outages

A P95 latency that doubles over a week is a signal, even if the average looks fine. Degraded latency is often the first visible symptom of a backing service problem — a slow database query, a memory leak, or a saturated connection pool.

Monitoring REST APIs

REST APIs respond to standard HTTP methods (GET, POST, PUT, PATCH, DELETE). Most monitoring tools handle REST natively.

GET endpoint monitoring

The simplest case: send a GET request and assert the response.

URL: https://api.example.com/v1/products
Method: GET
Headers: Authorization: Bearer {{test_token}}
Assert: status 200
Assert: body contains "data"
Assert: response time < 500ms

Use a read-only test resource or a dedicated monitoring account so you never modify production data.

POST endpoint monitoring

Testing write endpoints requires more care. Options:

Use a staging or sandbox environment with the same monitoring checks as production
Mock the endpoint behavior by creating a test resource and checking its response structure without persisting data
Monitor an idempotent POST (many APIs allow POST with an idempotency key; sending the same request repeatedly produces no side effects after the first)

Authentication monitoring

If your API requires authentication, monitor the auth flow separately:

Step 1: POST /auth/token with test credentials → Assert 200, assert "access_token" in body
Step 2: GET /users/me with Bearer token → Assert 200, assert "id" in body

A broken auth endpoint is an invisible outage: your API health check might pass while every logged-in user gets 401 errors.

Rate limit monitoring

Monitor your API's rate limit headers:

Assert: X-RateLimit-Remaining > 10

If your monitoring account is hitting rate limits, you need to either reduce check frequency or use a dedicated monitoring token with higher limits.

Monitoring GraphQL APIs

GraphQL uses a single endpoint (typically /graphql) with POST requests, which makes standard HTTP monitoring insufficient. A POST to /graphql with an invalid query body still returns HTTP 200 with an errors array in the response.

GraphQL health check

Send a simple introspection query:

{
  "query": "{ __typename }"
}

Assert:

HTTP 200
Body contains "data"
Body does NOT contain "errors"

Operation-specific monitoring

For production-critical operations, monitor each one with a test query:

{
  "query": "query MonitoringCheck { currentUser { id email } }",
  "variables": {}
}

Use a monitoring-specific user account. Assert the response contains expected fields and does not contain the errors key.

GraphQL latency

GraphQL queries vary enormously in complexity. A query that fetches a single field is not comparable to one that resolves five nested relations. Monitor each critical operation type separately rather than relying on a single latency metric.

Monitoring Webhooks

Webhooks are the most overlooked part of API monitoring. They're outbound — your system sends events to external endpoints — and they fail silently: no user error, no HTTP failure code in your application logs. The receiving system just never gets the event.

Webhook delivery monitoring with heartbeats

Use heartbeat monitoring to verify your webhook consumer is processing events:

Set up a heartbeat monitor in Vantaj with an interval matching your expected event frequency
In your webhook consumer, after successfully processing any event, send a ping to the heartbeat URL
If events stop flowing (or the consumer crashes), the heartbeat goes missing and you get alerted

# In your webhook handler
def handle_payment_event(payload):
    process_payment(payload)
    # Signal that the consumer is alive and processing
    requests.get("https://api.vantaj.co/heartbeat/your-heartbeat-id")

Webhook endpoint monitoring

Also monitor the receiving endpoints on the services you send to:

Monitor your Stripe webhook endpoint (/webhooks/stripe) from outside to confirm it accepts POST requests
Confirm response contains the expected acknowledgment
Set a short check interval (1-2 minutes) so delivery failures are caught quickly

Third-party webhook providers

For inbound webhooks from Stripe, GitHub, Twilio, and similar services, monitor:

Your webhook consumer URL is reachable (POST /webhooks/stripe returns 200)
Events are being processed at the expected rate (heartbeat monitor on your consumer)
No backlog is building in your event queue

API Monitoring Across Environments

Most teams have multiple environments: development, staging, and production. Monitor them differently.

Environment	Check Interval	Alert Priority	Notes
Production	30 sec – 1 min	Critical (page on-call)	Full monitoring, all endpoints
Staging	2–5 min	Low (Slack only)	Pre-deployment validation
Development	Manual or CI-triggered	None	Not continuous monitoring

A staging environment that goes down undetected for a week means your next deployment was tested against a broken environment. Monitor staging with at least basic HTTP checks.

Common API Monitoring Mistakes

Monitoring only the health endpoint. A passing /health check does not mean every API operation works. The health endpoint typically only checks database connectivity; it does not run actual queries or validate business logic.

Ignoring authentication in checks. If your monitoring bypasses auth (using IP allowlists or test tokens with elevated permissions), you won't catch auth failures that affect real users.

Not asserting response content. A 200 response containing an error message is a failed check. Always add keyword assertions to your most critical endpoints.

5-minute check intervals. For a payment API or authentication endpoint, 5 minutes of undetected downtime is significant. Use 1-minute or 30-second intervals for production APIs.

No webhook monitoring. Webhook failures are the most common silent failure mode in event-driven architectures. Add heartbeat monitors to every critical consumer.

Setting Up API Monitoring in Vantaj

Add an HTTP monitor with your API endpoint URL
Set the method (GET, POST) and any required headers (Authorization: Bearer your-token)
Add keyword assertions to validate response content, not just status codes
Set a response time threshold to catch latency degradation before it becomes an outage
Add a heartbeat monitor for each webhook consumer and background processing job

Vantaj checks from 10 global probe regions with consensus-based alerting, so a single regional network issue does not trigger a false positive. Alerts fire when multiple regions independently confirm the failure.

Frequently Asked Questions

What is the difference between API monitoring and synthetic monitoring?

API monitoring and synthetic monitoring overlap. Synthetic monitoring typically refers to scripted, multi-step tests that simulate user flows (login, search, checkout). API monitoring focuses on individual endpoint health: reachability, response structure, and latency. Both use automated checks rather than real user traffic. For complex flows, use synthetic monitoring; for endpoint-level health, API monitoring is sufficient and simpler to maintain.

How often should I check my API endpoints?

For production APIs, 1-minute check intervals are the standard minimum. Critical endpoints — authentication, payment, and primary data APIs — benefit from 30-second intervals. Staging environments can use 5-minute intervals.

Do I need to test every endpoint?

No. Focus on endpoints that directly affect users if they fail: authentication, core data operations, and any endpoint called in your primary user journey. Deep coverage of every endpoint is a testing problem, not a monitoring problem.

How do I monitor an API that requires authentication?

Add the authentication header to your monitor's request configuration. Use a dedicated monitoring account with read-only permissions, not an admin account. This way you're testing the same auth flow real users go through, and a monitoring check failure won't accidentally modify data.

Can I monitor third-party APIs my application depends on?

Yes. If your application depends on Stripe, Twilio, Sendgrid, or any external API, add an HTTP monitor for their public status endpoint or a read-only API endpoint. When a third-party API degrades, you want to know before your users do — and before your team spends an hour debugging what isn't your code.