Heartbeat Monitoring for Cron Jobs and Workers

What Is Heartbeat Monitoring?

Traditional uptime monitoring works by sending requests to your service and checking for a response. But not all critical processes are web-facing. Database backups, queue workers, scheduled reports, data pipelines - these are background jobs that run on a schedule, and when they silently stop working, nobody notices until it's too late.

Heartbeat monitoring flips the model. Instead of Vantaj pinging your service, your service pings Vantaj. If we don't receive a ping within the expected interval, we know something went wrong and alert you immediately.

It's the difference between checking if someone is home (uptime monitoring) and expecting a daily phone call that never comes (heartbeat monitoring).

How It Works

The concept is simple:

You create a heartbeat monitor in Vantaj and get a unique endpoint URL
Your cron job or worker sends an HTTP request to that URL when it completes successfully
Vantaj tracks the timing of each ping
If a ping doesn't arrive within the expected window, Vantaj triggers an alert

# Add this to the end of your cron job
curl -fsS --retry 3 https://api.vantaj.co/heartbeat/your-heartbeat-id

# Or in a Node.js worker
await fetch('https://api.vantaj.co/heartbeat/your-heartbeat-id')

# Or in a Python script
import requests
requests.get('https://api.vantaj.co/heartbeat/your-heartbeat-id')

The key detail: you only send the heartbeat after the job completes successfully. If the job crashes mid-execution, no heartbeat is sent, and Vantaj catches the failure.

When You Need Heartbeat Monitoring

Any process that runs on a schedule and doesn't have a public endpoint is a candidate. Here are the most common use cases:

Database Backups

Your nightly database backup is one of the most critical jobs in your infrastructure. If it fails silently for a week, you won't know until you actually need to restore - and by then it's a crisis. A heartbeat at the end of the backup script ensures you know the moment a backup doesn't complete.

Queue and Background Workers

Workers processing jobs from a queue (Sidekiq, Celery, BullMQ) can crash, deadlock, or run out of memory. Periodic heartbeats from each worker confirm they're alive and processing. If a worker goes silent, you can investigate before the queue backs up.

Scheduled Reports and Emails

Daily digest emails, weekly analytics reports, monthly invoicing runs - these are jobs your business depends on but rarely thinks about until they break. A heartbeat after each successful send confirms delivery is happening on schedule.

Data Sync and ETL Pipelines

Data pipelines that sync between databases, transform data, or push to warehouses are notoriously brittle. A heartbeat at the end of each pipeline run gives you confidence that data is flowing correctly.

Health Check Scripts

Custom scripts that verify application state - checking disk space, validating configuration, confirming third-party API connectivity - can report their status via heartbeat. If the script itself fails to run, the missing heartbeat catches it.

Certificate and Domain Renewal Jobs

If you're using automated certificate renewal (certbot, ACME clients), a heartbeat after each renewal confirms the process completed. Combined with Vantaj's SSL monitoring, you get defense in depth.

Setting Up a Heartbeat Monitor

Getting started takes less than a minute:

Step	What to do
1. Create	Add a new heartbeat monitor in your Vantaj dashboard
2. Name it	Give it a descriptive name (e.g., "Nightly Postgres Backup")
3. Set the interval	How often the job should run (every 5 min, hourly, daily, weekly)
4. Set the grace period	How long to wait past the expected time before alerting
5. Copy the URL	Add the heartbeat endpoint to your job's success handler

That's it. No agents to install, no SDKs to integrate, no configuration files to manage. One HTTP request at the end of your job is all it takes.

Grace Periods: Avoiding False Alerts

Not every job runs at exactly the same time. A daily backup that usually finishes at 2:05 AM might occasionally take until 2:20 AM due to database load. Without a grace period, you'd get a false alert every time the job runs a little slow.

The grace period is the buffer time Vantaj waits after the expected window before firing an alert. Set it based on the normal variance of your job:

Job Type	Typical Interval	Suggested Grace Period
Queue worker heartbeat	Every 5 minutes	2–3 minutes
Hourly data sync	Every hour	10–15 minutes
Daily backup	Every 24 hours	30–60 minutes
Weekly report	Every 7 days	2–4 hours

The goal is simple: long enough to account for normal variance, short enough to catch actual failures quickly.

Best Practices

Only Heartbeat on Success

Send the heartbeat after your job completes successfully, not at the start. If you heartbeat at the beginning, a job that crashes halfway through still looks healthy to your monitor.

# ✅ Correct - heartbeat after success
pg_dump mydb > backup.sql && curl -fsS https://api.vantaj.co/heartbeat/abc123

# ❌ Wrong - heartbeat before the actual work
curl -fsS https://api.vantaj.co/heartbeat/abc123 && pg_dump mydb > backup.sql

Add Retries to the Heartbeat Request

The heartbeat HTTP request itself can fail due to transient network issues. Add retries so a momentary blip doesn't cause a false alert:

curl -fsS --retry 3 --retry-delay 5 https://api.vantaj.co/heartbeat/abc123

One Heartbeat Per Logical Job

Don't combine multiple jobs into a single heartbeat. If your backup and your report both ping the same heartbeat, a failure in one might be masked by the other succeeding. Create separate heartbeat monitors for each critical job.

Monitor the Monitors

Heartbeat monitoring pairs perfectly with uptime monitoring. Use uptime checks for your public-facing services and heartbeats for your background jobs. Together, they give you complete visibility into your infrastructure's health.

Why Vantaj for Heartbeat Monitoring

Vantaj's heartbeat monitoring is designed with the same principles as the rest of the platform: simple setup, sensible defaults, and reliable alerting. There are no agents to install, no complex configurations, and no hidden costs. Create a heartbeat, add a curl command to your job, and you're covered.

When a heartbeat goes missing, Vantaj's alerting pipeline ensures the notification reaches you - via email, Slack, Discord, webhook, or any combination. The same reliable, redundant infrastructure that powers our uptime monitoring backs every heartbeat alert.