Guide

Scheduler

Automate recurring scrapes with cron expressions. AlterLab runs your schedules, collects results, and delivers them via webhook.

AlterLab Schedules dashboard showing active schedules, cron timing, and execution history — Create and manage recurring scrape schedules from the dashboard. View execution history and results for each schedule.

Balance-Based Access

The number of schedules and minimum interval you can use depends on your account balance. See the Balance-Based Limits section below.

How It Works

Create

Define a schedule with one or more URLs, a cron expression, and optional output formats and webhook URL.

Run

At each scheduled time, AlterLab submits a batch scrape for all URLs in the schedule. Cost is debited per run.

Deliver

Results are available via the execution history endpoint or delivered to your webhook URL when the run completes.

Create a Scrape Schedule

POST

/api/v1/schedules

Bash

curl -X POST https://api.alterlab.io/api/v1/schedules \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily product prices",
    "urls": [
      "https://store.example.com/product-1",
      "https://store.example.com/product-2"
    ],
    "cron": "0 9 * * *",
    "timezone": "America/New_York",
    "formats": ["json", "markdown"],
    "options": {
      "wait_for": ".price-tag",
      "timeout": 30
    },
    "webhook_url": "https://your-server.com/schedule-webhook"
  }'

Request Body

Parameter	Type	Required	Description
name	string	Yes	Human-readable name (1–255 chars)
urls	string[]	Yes	URLs to scrape each run (1–100)
cron	string	Yes	Standard 5-field cron expression (minute hour day month weekday)
timezone	string	No	IANA timezone (default: `UTC`)
formats	string[]	No	text, markdown, json, html, rag
options	object	No	wait_for, timeout, headers, cookies, proxy_country
webhook_url	string	No	URL to receive results after each run

Cron Syntax

Standard 5-field cron expressions. Only minute-level granularity (no seconds).

Expression	Schedule
0 9 * * *	Every day at 9:00 AM
/15 * * *	Every 15 minutes
0 /6 * *	Every 6 hours
0 8 * * 1-5	Weekdays at 8:00 AM
30 2 1 * *	1st of each month at 2:30 AM

Crawl Schedules

Schedule recurring crawl jobs using POST /api/v1/crawl/schedules. Crawl schedules automatically re-crawl a site on your chosen cron cadence, maintaining fresh snapshots of an entire domain or path prefix.

POST

/api/v1/crawl/schedules

Bash

curl -X POST https://api.alterlab.io/api/v1/crawl/schedules \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Weekly docs crawl",
    "url": "https://docs.example.com",
    "cron": "0 2 * * 0",
    "timezone": "UTC",
    "options": {
      "max_pages": 500,
      "max_depth": 5,
      "formats": ["markdown"]
    },
    "webhook_url": "https://your-server.com/crawl-results"
  }'

Parameter	Type	Required	Description
name	string	Yes	Human-readable name for the schedule
url	string	Yes	Seed URL for the crawl
cron	string	Yes	Standard 5-field cron expression
timezone	string	No	IANA timezone (default: UTC)
options	object	No	max_pages, max_depth, formats, include_paths, exclude_paths
webhook_url	string	No	URL to receive a `crawl.completed` event when the crawl finishes

Manage Crawl Schedules

Bash

# List crawl schedules
curl https://api.alterlab.io/api/v1/crawl/schedules \
  -H "Authorization: Bearer your_jwt_token"

# Get a specific crawl schedule
curl https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

# Update a crawl schedule
curl -X PATCH https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{ "cron": "0 3 * * 0" }'

# Pause a crawl schedule
curl -X POST https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id}/pause \
  -H "Authorization: Bearer your_jwt_token"

# Resume a crawl schedule
curl -X POST https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id}/resume \
  -H "Authorization: Bearer your_jwt_token"

# Trigger a crawl run immediately (outside of cron schedule)
curl -X POST https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id}/run \
  -H "Authorization: Bearer your_jwt_token"

# Delete a crawl schedule
curl -X DELETE https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

Map Schedules

Schedule recurring site-map discovery jobs using POST /api/v1/map/schedules. Map schedules periodically re-discover all URLs under a domain and store the updated link graph, keeping your sitemap fresh for downstream crawls or monitoring.

POST

/api/v1/map/schedules

Bash

curl -X POST https://api.alterlab.io/api/v1/map/schedules \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily sitemap refresh",
    "url": "https://example.com",
    "cron": "0 1 * * *",
    "timezone": "UTC",
    "options": {
      "include_subdomains": false,
      "limit": 10000
    },
    "webhook_url": "https://your-server.com/map-results"
  }'

Parameter	Type	Required	Description
name	string	Yes	Human-readable name for the schedule
url	string	Yes	Root URL to map
cron	string	Yes	Standard 5-field cron expression
timezone	string	No	IANA timezone (default: UTC)
options	object	No	include_subdomains, limit, include_paths, exclude_paths
webhook_url	string	No	URL to receive the discovered link list on completion

Manage Map Schedules

Bash

# List map schedules
curl https://api.alterlab.io/api/v1/map/schedules \
  -H "Authorization: Bearer your_jwt_token"

# Get a specific map schedule
curl https://api.alterlab.io/api/v1/map/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

# Update a map schedule
curl -X PATCH https://api.alterlab.io/api/v1/map/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{ "cron": "0 2 * * *" }'

# Pause a map schedule
curl -X POST https://api.alterlab.io/api/v1/map/schedules/{schedule_id}/pause \
  -H "Authorization: Bearer your_jwt_token"

# Resume a map schedule
curl -X POST https://api.alterlab.io/api/v1/map/schedules/{schedule_id}/resume \
  -H "Authorization: Bearer your_jwt_token"

# Trigger a map run immediately
curl -X POST https://api.alterlab.io/api/v1/map/schedules/{schedule_id}/run \
  -H "Authorization: Bearer your_jwt_token"

# Delete a map schedule
curl -X DELETE https://api.alterlab.io/api/v1/map/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

Map → Crawl Pipeline

A common pattern is to run a map schedule daily to discover new URLs, then feed the results into a crawl job for those URLs. This keeps your crawl efficient — only newly discovered pages are re-scraped.

Balance-Based Limits

Schedule limits scale with your account balance. Add more funds to unlock more schedules and shorter intervals.

Balance	Max Active Schedules	Min Interval
$0 – $50	3	60 minutes
$50 – $200	10	15 minutes
$200 – $500	25	5 minutes
$500+	100	1 minute

Manage Schedules

List Schedules

GET

/api/v1/schedules

Query parameters: limit (1–100, default 20), offset (default 0), active_only (default false).

Bash

curl https://api.alterlab.io/api/v1/schedules?active_only=true \
  -H "Authorization: Bearer your_jwt_token"

Update a Schedule

PATCH

/api/v1/schedules/{schedule_id}

Send only the fields you want to change. Changing cron or timezone recomputes next_run_at.

Bash

curl -X PATCH https://api.alterlab.io/api/v1/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{ "cron": "0 */2 * * *", "name": "Every 2 hours" }'

Pause & Resume

Bash

# Pause
curl -X POST https://api.alterlab.io/api/v1/schedules/{schedule_id}/pause \
  -H "Authorization: Bearer your_jwt_token"

# Resume (recomputes next_run_at, re-checks balance limits)
curl -X POST https://api.alterlab.io/api/v1/schedules/{schedule_id}/resume \
  -H "Authorization: Bearer your_jwt_token"

Delete a Schedule

DELETE

/api/v1/schedules/{schedule_id}

Soft-deletes the schedule (deactivates it). Returns 204 No Content.

Execution History

GET

/api/v1/schedules/{schedule_id}/runs

Returns a paginated list of past executions for a schedule.

JSON

{
  "runs": [
    {
      "id": "run-uuid-...",
      "schedule_id": "schedule-uuid-...",
      "status": "completed",
      "job_ids": ["job-1-...", "job-2-..."],
      "batch_id": "batch-...",
      "urls_total": 2,
      "urls_completed": 2,
      "urls_failed": 0,
      "credits_used": 30000,
      "started_at": "2026-03-03T09:00:01Z",
      "completed_at": "2026-03-03T09:00:14Z",
      "created_at": "2026-03-03T09:00:00Z"
    }
  ],
  "total": 42
}

Webhook Delivery

If your schedule has a webhook_url, results are delivered after each run completes. The webhook payload follows the same format as batch webhooks.

Tip

Combine schedules with webhooks for fully automated data pipelines. Schedule the scrape, receive results at your endpoint, and process them without any polling.

Python Example

Python

import requests

API_URL = "https://api.alterlab.io/api/v1"
HEADERS = {
    "Authorization": "Bearer your_jwt_token",
    "Content-Type": "application/json",
}

# Create a schedule
schedule = requests.post(
    f"{API_URL}/schedules",
    headers=HEADERS,
    json={
        "name": "Competitor prices - daily",
        "urls": [
            "https://competitor-a.com/product",
            "https://competitor-b.com/product",
        ],
        "cron": "0 9 * * *",
        "timezone": "America/New_York",
        "formats": ["json"],
        "webhook_url": "https://your-server.com/prices",
    },
).json()

print(f"Schedule created: {schedule['id']}")
print(f"Next run: {schedule['next_run_at']}")

# List active schedules
schedules = requests.get(
    f"{API_URL}/schedules?active_only=true",
    headers=HEADERS,
).json()

for s in schedules["schedules"]:
    print(f"  {s['name']} — next: {s['next_run_at']}, runs: {s['run_count']}")

# Check execution history
runs = requests.get(
    f"{API_URL}/schedules/{schedule['id']}/runs",
    headers=HEADERS,
).json()

for run in runs["runs"]:
    print(f"  Run {run['id']}: {run['status']} — {run['urls_completed']}/{run['urls_total']}")

Node.js Example

TYPESCRIPT

const API_URL = "https://api.alterlab.io/api/v1";
const headers = {
  Authorization: "Bearer your_jwt_token",
  "Content-Type": "application/json",
};

// Create a schedule
const schedule = await fetch(`${API_URL}/schedules`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "Competitor prices - daily",
    urls: [
      "https://competitor-a.com/product",
      "https://competitor-b.com/product",
    ],
    cron: "0 9 * * *",
    timezone: "America/New_York",
    formats: ["json"],
    webhook_url: "https://your-server.com/prices",
  }),
}).then((r) => r.json());

console.log(`Schedule: ${schedule.id}, next run: ${schedule.next_run_at}`);

// List active schedules
const list = await fetch(`${API_URL}/schedules?active_only=true`, {
  headers,
}).then((r) => r.json());

for (const s of list.schedules) {
  console.log(`  ${s.name} — runs: ${s.run_count}`);
}

// Check runs
const runs = await fetch(`${API_URL}/schedules/${schedule.id}/runs`, {
  headers,
}).then((r) => r.json());

for (const run of runs.runs) {
  console.log(`  ${run.status}: ${run.urls_completed}/${run.urls_total}`);
}

Batch Scraping Change Detection

Last updated: March 2026

Guide

Scheduler

Automate recurring scrapes with cron expressions. AlterLab runs your schedules, collects results, and delivers them via webhook.

Balance-Based Access

The number of schedules and minimum interval you can use depends on your account balance. See the Balance-Based Limits section below.

How It Works

Create

Define a schedule with one or more URLs, a cron expression, and optional output formats and webhook URL.

Run

At each scheduled time, AlterLab submits a batch scrape for all URLs in the schedule. Cost is debited per run.

Deliver

Results are available via the execution history endpoint or delivered to your webhook URL when the run completes.

Create a Scrape Schedule

POST

/api/v1/schedules

Bash

curl -X POST https://api.alterlab.io/api/v1/schedules \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily product prices",
    "urls": [
      "https://store.example.com/product-1",
      "https://store.example.com/product-2"
    ],
    "cron": "0 9 * * *",
    "timezone": "America/New_York",
    "formats": ["json", "markdown"],
    "options": {
      "wait_for": ".price-tag",
      "timeout": 30
    },
    "webhook_url": "https://your-server.com/schedule-webhook"
  }'

Request Body

Parameter	Type	Required	Description
name	string	Yes	Human-readable name (1–255 chars)
urls	string[]	Yes	URLs to scrape each run (1–100)
cron	string	Yes	Standard 5-field cron expression (minute hour day month weekday)
timezone	string	No	IANA timezone (default: `UTC`)
formats	string[]	No	text, markdown, json, html, rag
options	object	No	wait_for, timeout, headers, cookies, proxy_country
webhook_url	string	No	URL to receive results after each run

Cron Syntax

Standard 5-field cron expressions. Only minute-level granularity (no seconds).

Expression	Schedule
0 9 * * *	Every day at 9:00 AM
/15 * * *	Every 15 minutes
0 /6 * *	Every 6 hours
0 8 * * 1-5	Weekdays at 8:00 AM
30 2 1 * *	1st of each month at 2:30 AM

Crawl Schedules

POST

/api/v1/crawl/schedules

Bash

curl -X POST https://api.alterlab.io/api/v1/crawl/schedules \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Weekly docs crawl",
    "url": "https://docs.example.com",
    "cron": "0 2 * * 0",
    "timezone": "UTC",
    "options": {
      "max_pages": 500,
      "max_depth": 5,
      "formats": ["markdown"]
    },
    "webhook_url": "https://your-server.com/crawl-results"
  }'

Parameter	Type	Required	Description
name	string	Yes	Human-readable name for the schedule
url	string	Yes	Seed URL for the crawl
cron	string	Yes	Standard 5-field cron expression
timezone	string	No	IANA timezone (default: UTC)
options	object	No	max_pages, max_depth, formats, include_paths, exclude_paths
webhook_url	string	No	URL to receive a `crawl.completed` event when the crawl finishes

Manage Crawl Schedules

Bash

# List crawl schedules
curl https://api.alterlab.io/api/v1/crawl/schedules \
  -H "Authorization: Bearer your_jwt_token"

# Get a specific crawl schedule
curl https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

# Update a crawl schedule
curl -X PATCH https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{ "cron": "0 3 * * 0" }'

# Pause a crawl schedule
curl -X POST https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id}/pause \
  -H "Authorization: Bearer your_jwt_token"

# Resume a crawl schedule
curl -X POST https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id}/resume \
  -H "Authorization: Bearer your_jwt_token"

# Trigger a crawl run immediately (outside of cron schedule)
curl -X POST https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id}/run \
  -H "Authorization: Bearer your_jwt_token"

# Delete a crawl schedule
curl -X DELETE https://api.alterlab.io/api/v1/crawl/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

Map Schedules

POST

/api/v1/map/schedules

Bash

curl -X POST https://api.alterlab.io/api/v1/map/schedules \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily sitemap refresh",
    "url": "https://example.com",
    "cron": "0 1 * * *",
    "timezone": "UTC",
    "options": {
      "include_subdomains": false,
      "limit": 10000
    },
    "webhook_url": "https://your-server.com/map-results"
  }'

Parameter	Type	Required	Description
name	string	Yes	Human-readable name for the schedule
url	string	Yes	Root URL to map
cron	string	Yes	Standard 5-field cron expression
timezone	string	No	IANA timezone (default: UTC)
options	object	No	include_subdomains, limit, include_paths, exclude_paths
webhook_url	string	No	URL to receive the discovered link list on completion

Manage Map Schedules

Bash

# List map schedules
curl https://api.alterlab.io/api/v1/map/schedules \
  -H "Authorization: Bearer your_jwt_token"

# Get a specific map schedule
curl https://api.alterlab.io/api/v1/map/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

# Update a map schedule
curl -X PATCH https://api.alterlab.io/api/v1/map/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{ "cron": "0 2 * * *" }'

# Pause a map schedule
curl -X POST https://api.alterlab.io/api/v1/map/schedules/{schedule_id}/pause \
  -H "Authorization: Bearer your_jwt_token"

# Resume a map schedule
curl -X POST https://api.alterlab.io/api/v1/map/schedules/{schedule_id}/resume \
  -H "Authorization: Bearer your_jwt_token"

# Trigger a map run immediately
curl -X POST https://api.alterlab.io/api/v1/map/schedules/{schedule_id}/run \
  -H "Authorization: Bearer your_jwt_token"

# Delete a map schedule
curl -X DELETE https://api.alterlab.io/api/v1/map/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token"

Map → Crawl Pipeline

Balance-Based Limits

Schedule limits scale with your account balance. Add more funds to unlock more schedules and shorter intervals.

Balance	Max Active Schedules	Min Interval
$0 – $50	3	60 minutes
$50 – $200	10	15 minutes
$200 – $500	25	5 minutes
$500+	100	1 minute

Manage Schedules

List Schedules

GET

/api/v1/schedules

Query parameters: limit (1–100, default 20), offset (default 0), active_only (default false).

Bash

curl https://api.alterlab.io/api/v1/schedules?active_only=true \
  -H "Authorization: Bearer your_jwt_token"

Update a Schedule

PATCH

/api/v1/schedules/{schedule_id}

Send only the fields you want to change. Changing cron or timezone recomputes next_run_at.

Bash

curl -X PATCH https://api.alterlab.io/api/v1/schedules/{schedule_id} \
  -H "Authorization: Bearer your_jwt_token" \
  -H "Content-Type: application/json" \
  -d '{ "cron": "0 */2 * * *", "name": "Every 2 hours" }'

Pause & Resume

Bash

# Pause
curl -X POST https://api.alterlab.io/api/v1/schedules/{schedule_id}/pause \
  -H "Authorization: Bearer your_jwt_token"

# Resume (recomputes next_run_at, re-checks balance limits)
curl -X POST https://api.alterlab.io/api/v1/schedules/{schedule_id}/resume \
  -H "Authorization: Bearer your_jwt_token"

Delete a Schedule

DELETE

/api/v1/schedules/{schedule_id}

Soft-deletes the schedule (deactivates it). Returns 204 No Content.

Execution History

GET

/api/v1/schedules/{schedule_id}/runs

Returns a paginated list of past executions for a schedule.

JSON

{
  "runs": [
    {
      "id": "run-uuid-...",
      "schedule_id": "schedule-uuid-...",
      "status": "completed",
      "job_ids": ["job-1-...", "job-2-..."],
      "batch_id": "batch-...",
      "urls_total": 2,
      "urls_completed": 2,
      "urls_failed": 0,
      "credits_used": 30000,
      "started_at": "2026-03-03T09:00:01Z",
      "completed_at": "2026-03-03T09:00:14Z",
      "created_at": "2026-03-03T09:00:00Z"
    }
  ],
  "total": 42
}

Webhook Delivery

If your schedule has a webhook_url, results are delivered after each run completes. The webhook payload follows the same format as batch webhooks.

Tip

Combine schedules with webhooks for fully automated data pipelines. Schedule the scrape, receive results at your endpoint, and process them without any polling.

Python Example

Python

import requests

API_URL = "https://api.alterlab.io/api/v1"
HEADERS = {
    "Authorization": "Bearer your_jwt_token",
    "Content-Type": "application/json",
}

# Create a schedule
schedule = requests.post(
    f"{API_URL}/schedules",
    headers=HEADERS,
    json={
        "name": "Competitor prices - daily",
        "urls": [
            "https://competitor-a.com/product",
            "https://competitor-b.com/product",
        ],
        "cron": "0 9 * * *",
        "timezone": "America/New_York",
        "formats": ["json"],
        "webhook_url": "https://your-server.com/prices",
    },
).json()

print(f"Schedule created: {schedule['id']}")
print(f"Next run: {schedule['next_run_at']}")

# List active schedules
schedules = requests.get(
    f"{API_URL}/schedules?active_only=true",
    headers=HEADERS,
).json()

for s in schedules["schedules"]:
    print(f"  {s['name']} — next: {s['next_run_at']}, runs: {s['run_count']}")

# Check execution history
runs = requests.get(
    f"{API_URL}/schedules/{schedule['id']}/runs",
    headers=HEADERS,
).json()

for run in runs["runs"]:
    print(f"  Run {run['id']}: {run['status']} — {run['urls_completed']}/{run['urls_total']}")

Node.js Example

TYPESCRIPT

const API_URL = "https://api.alterlab.io/api/v1";
const headers = {
  Authorization: "Bearer your_jwt_token",
  "Content-Type": "application/json",
};

// Create a schedule
const schedule = await fetch(`${API_URL}/schedules`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "Competitor prices - daily",
    urls: [
      "https://competitor-a.com/product",
      "https://competitor-b.com/product",
    ],
    cron: "0 9 * * *",
    timezone: "America/New_York",
    formats: ["json"],
    webhook_url: "https://your-server.com/prices",
  }),
}).then((r) => r.json());

console.log(`Schedule: ${schedule.id}, next run: ${schedule.next_run_at}`);

// List active schedules
const list = await fetch(`${API_URL}/schedules?active_only=true`, {
  headers,
}).then((r) => r.json());

for (const s of list.schedules) {
  console.log(`  ${s.name} — runs: ${s.run_count}`);
}

// Check runs
const runs = await fetch(`${API_URL}/schedules/${schedule.id}/runs`, {
  headers,
}).then((r) => r.json());

for (const run of runs.runs) {
  console.log(`  ${run.status}: ${run.urls_completed}/${run.urls_total}`);
}

Batch Scraping Change Detection

Last updated: March 2026