Back to blog
Tutorials

What to Monitor: The Complete Checklist for SaaS, E-commerce, and APIs

47 prioritized checks across HTTP, SSL, domain expiry, heartbeat, TCP, and DNS, organized by business type. Use this when setting up monitoring from scratch or auditing an existing setup.

Vantaj Team ยท June 26, 2026 ยท 10 min read

The most common question from teams setting up monitoring for the first time is: what should I actually be watching?

Most guides list monitor types. This one tells you which specific endpoints, certificates, jobs, and records to monitor, organized by priority, so you can set up a complete monitoring stack without missing the things that matter.

Priority key: ๐Ÿ”ด Critical: alert immediately. ๐ŸŸก Important: alert within 5 minutes. ๐ŸŸข Informational: daily digest is sufficient.


HTTP and Application Monitors

These confirm your application is responding correctly, not just that the server is running.

For Every Product

MonitorPriorityWhy
Homepage / root URL๐ŸŸกFirst thing customers check when something feels wrong
Login / auth endpoint๐Ÿ”ดIf users can't log in, the rest of the product is irrelevant
Primary API endpoint๐Ÿ”ดThe most-called endpoint your product depends on
Health check endpoint๐Ÿ”ด/health or /ping; your own team uses this to verify recovery
Signup / registration๐ŸŸกA broken signup flow means zero new users until someone notices
Password reset๐ŸŸกSilent broken state; only surfaces when a user is locked out

Set up the health check endpoint if you don't already have one. A simple GET /health returning {"status": "ok"} with a 200 is enough. During an incident, this is the fastest way to confirm recovery.

Additional Checks for SaaS Products

MonitorPriorityWhy
Core feature API๐Ÿ”ดThe endpoint behind your product's primary value
Webhook delivery endpoint๐ŸŸกWebhook failures are silent: customers see nothing, their integrations just stop
Billing / subscription API๐ŸŸกA broken billing page blocks upgrades and causes churn at renewal
User dashboard๐ŸŸกThe page users land on after login; degraded performance is noticed immediately

Additional Checks for E-commerce

MonitorPriorityWhy
Product catalog / listing page๐Ÿ”ดIf products don't load, nothing sells
Cart / checkout page๐Ÿ”ดDirect, immediate, measurable revenue loss when broken
Payment processor integration๐Ÿ”ดStripe, Braintree, or PayPal endpoint; payment failures are the most urgent alert
Order confirmation page๐Ÿ”ดConfirms the full purchase flow completed
Search / product search API๐ŸŸกSecond most impactful e-commerce failure after checkout

For e-commerce, add a peak multiplier in your alerting expectations: a 4-hour outage during a 10x traffic period costs 10x as much as the same outage on a normal day. Check your checkout monitor first when something breaks during a sale event.

Additional Checks for Developer APIs

MonitorPriorityWhy
Primary API base URL๐Ÿ”ดapi.yourdomain.com with a lightweight authenticated request
Auth / token endpoint๐Ÿ”ดIf auth breaks, all API consumers break simultaneously
Documentation site๐ŸŸกdocs.yourdomain.com; downtime during an evaluation kills deals

SSL Certificate Monitors

SSL failures block all users immediately. The browser shows a full-page warning; most users don't click through. Set expiry alerts well in advance, because 7 days is too short if renewal requires vendor coordination or a DNS change.

MonitorPriorityRecommended alert thresholds
Primary domain SSL๐Ÿ”ด90, 60, 30, 7, 1 day before expiry
API subdomain SSL๐Ÿ”ดSame; expires independently of your main domain
App subdomain SSL๐Ÿ”ดSame
Docs / marketing subdomains๐ŸŸก30, 7, 1 day before expiry
Custom customer domains๐ŸŸกIf you support CNAME-based custom domains, monitor a sample set; auto-renewal failures are common here

Don't rely on auto-renewal alone. Let's Encrypt, AWS ACM, and commercial CA portals all have failure modes: DNS validation errors, expired billing, misconfigured ACME clients, CDN certificate caching. Monitoring catches silent renewal failures before they cause outages.


Domain Expiry Monitors

Domain expiry is rarer than SSL expiry but more catastrophic. An expired domain takes your entire product offline, including the SSL certificate, DNS, and email. Recovery involves your registrar's support queue.

MonitorPriorityRecommended alert thresholds
Primary domain๐Ÿ”ด90, 60, 30, 14 days before expiry
Brand protection domains๐ŸŸก.io, .co, .net variants you own; expiry lets squatters take them
Acquired product domains๐ŸŸกAlert at 60 days; these often have different registrar accounts

Heartbeat Monitors

Heartbeat monitoring inverts the check: instead of you pinging the job, the job pings a URL on each successful run. If the ping stops arriving, the monitor alerts. This is the only reliable way to detect silent cron failures.

JobPriorityWhy
Database backup job๐Ÿ”ดA backup that silently stops running is a disaster waiting for a trigger
Billing renewal / subscription sync๐Ÿ”ดSubscription states diverge from your payment processor; silent revenue loss
Email delivery queue๐Ÿ”ดTransactional emails (receipts, resets, notifications) stop without any error
User notification job๐ŸŸกDigest emails, alerts, summaries; users notice when these go missing
Data sync / ETL pipeline๐ŸŸกStale data surfaces as product bugs, not monitoring alerts
Report generation job๐ŸŸกScheduled reports that internal teams rely on
Cleanup / maintenance jobs๐ŸŸขLog rotation, temp file cleanup, expired session purge

Configure heartbeat intervals to match your cron schedule plus a 10โ€“20% grace period. A job that runs every hour should have a heartbeat window of 66โ€“72 minutes, not 60, to account for startup time and processing delays.


TCP Port Monitors

Use for services that don't expose HTTP endpoints.

PortServicePriority
5432PostgreSQL๐Ÿ”ด
3306MySQL๐Ÿ”ด
27017MongoDB๐Ÿ”ด
6379Redis๐Ÿ”ด
587 / 465SMTP๐ŸŸก
22SSH๐ŸŸก
3389RDP๐ŸŸข

A database host that stops accepting TCP connections causes application failures that surface as HTTP 500 errors, not as "database unavailable." The TCP port monitor tells you the failure is at the infrastructure layer before you spend 30 minutes debugging application code.


DNS Monitors

DNS changes are rare, which is exactly why unexpected changes are significant. Alert on any value change rather than setting specific thresholds; the expected value of an NS record should never change without advance planning.

RecordPriorityAlert condition
Primary domain A record๐Ÿ”ดAny IP address change
NS records๐Ÿ”ดAny change; unexpected NS changes are the strongest signal of DNS hijacking
MX records๐ŸŸกAny change; stops email delivery for your entire domain
API subdomain A record๐ŸŸกAny IP address change
SPF TXT record๐ŸŸขValue change; affects email deliverability and spam filter performance
DMARC TXT record๐ŸŸขValue change

If you're starting from zero, this order prioritizes coverage of the most impactful failures:

  1. Login endpoint (HTTP)
  2. Primary API endpoint (HTTP)
  3. Primary domain SSL certificate
  4. Homepage (HTTP)
  5. Checkout or core feature endpoint (HTTP)
  6. Primary domain expiry (WHOIS/RDAP)
  7. Database backup cron (heartbeat)
  8. Billing sync cron (heartbeat)
  9. Database TCP port
  10. NS records (DNS)

These 10 monitors cover the failures most likely to affect users and the silent failures most likely to compound into larger problems. Add the rest of the list once these are stable.

Monitor Settings Reference

Monitor typeCheck intervalAlert after
HTTP: critical endpoints1 minute2 consecutive failures from all regions
HTTP: secondary pages5 minutes2 consecutive failures
SSL certificate12 hoursAt 90/60/30/7/1 days before expiry
Domain expiryDailyAt 90/60/30/14 days before expiry
HeartbeatMatch cron schedule + 10%1 missed expected ping
TCP port5 minutes2 consecutive failures
DNS record15 minutesAny value change

Requiring 2 consecutive failures before alerting eliminates most false positives caused by transient network issues. A monitor checking every minute that requires 2 consecutive failures still alerts within 2 minutes of a real outage, fast enough for any production incident.

Frequently Asked Questions

How many monitors do I need?

For a typical SaaS product, 15โ€“25 monitors covers everything: 6โ€“10 HTTP checks, 3โ€“5 SSL certificates, 1โ€“2 domain expiry monitors, 3โ€“5 heartbeat monitors, and a handful of DNS and TCP checks. More monitors add coverage; they don't improve detection speed for the monitors you already have.

Should I monitor staging as well as production?

Monitor production first, completely. Staging monitors are useful for catching deployment issues before they reach production, but they're a secondary concern. A broken staging environment that hasn't been monitored for a week costs nothing; a broken production login endpoint that hasn't been monitored for an hour costs customers.

What check interval should I use?

1 minute for anything customer-facing that generates revenue or blocks access. 5 minutes for secondary pages. Faster than 1 minute is rarely necessary; most outages aren't recovered in under a minute, so additional checks don't change your response time.

Do I need separate tools for each monitor type?

No. Vantaj monitors HTTP endpoints, SSL certificates, domain expiry, heartbeats, TCP ports, and DNS records from a single dashboard. The free tier covers 20 monitors across all types, enough to get full coverage for most small products.

For a deeper look at each monitor type, see ICMP ping monitoring, heartbeat monitoring for cron jobs, and DNS monitoring.