The /scrapes endpoints let you trigger ad-hoc scrapes and retrieve their metadata. Requests are synchronous: ScrapeNest dispatches the job and waits for a result (subject to timeout).
Submit a scrape
POST /scrapes (auth required)
Payload (JSON):
{
"url": "https://example.com",
"backend": "playwright",
"include_body": true,
"take_screenshot": false,
"timeout": 30,
"wait_until": "networkidle"
}
Fields:
url(required): URL to fetch.backend(required): one ofhttpx,playwright,scrapling.include_body(defaulttrue, aliassaveHtml): return HTML content.take_screenshot(defaultfalse, aliastakeScreenshot): capture screenshot.timeout(optional): positive seconds; wait budget for the scrape itself.wait_until(optional):load | domcontentloaded | networkidle(Playwright-like).
Response (success example, HTTP 200):
{
"id": "scrape-id",
"jobId": "scrape-id",
"resultId": "result-id",
"resultKey": "s3/key",
"status": "succeeded",
"ok": true,
"statusCode": 200,
"headers": { "content-type": "text/html" },
"html": "<html>...</html>",
"htmlKey": "s3/html/key",
"screenshot": null,
"screenshotKey": null,
"errorCode": null,
"errorMessage": null
}
Failure shapes reuse the same envelope with ok: false, status: "failed", and errorCode / errorMessage. HTTP status codes:
422: validation errors or scrape completed with failure.504: scrape result timed out (errorCode: "timeout").502: gateway error (errorCode: "gateway_error").401: missing/invalid token.
The service adds 15 seconds to the provided timeout when waiting for the result, so you can budget both fetch and result propagation.
List scrapes
GET /scrapes (auth required)
Query params:
status:pending|succeeded|failedbackend:httpx|playwright|scraplingq: substring match on URL- Pagination:
page(default 1),perPage(default 20, max 100) - Sorting:
sort(createdAt|completedAt),dir(asc|desc)
Response:
{
"items": [
{
"id": "uuid",
"url": "https://example.com",
"backend": "playwright",
"status": "succeeded",
"statusCode": 200,
"saveHtml": true,
"takeScreenshot": false,
"createdAt": "2024-01-01T12:00:00Z",
"completedAt": "2024-01-01T12:00:05Z",
"screenshotUrl": null,
"errorCode": null
}
],
"total": 1,
"page": 1,
"perPage": 20,
"pageCount": 1
}
Get a scrape
GET /scrapes/{id} (auth required)
Returns the same metadata as the list plus errorMessage. Scrape payloads (HTML, screenshot, headers) are only included in the immediate POST /scrapes response.