Run Extraction
POST /v1/extractions/run
Run an extraction on a PDF document. The document is processed using the specified template, and the extracted data is returned as structured JSON matching the template's field definitions.
Try it
Test this endpoint interactively in the Swagger UI.
Authorization required
Include your API key in the Authorization header.
Request
Headers
| Header | Value | Required |
|---|---|---|
Authorization | Bearer <token> | Yes |
Content-Type | application/json | Yes |
Query Parameters
| Param | Type | Required | Description |
|---|---|---|---|
async | string | No | Set to "true" to return immediately with a processing status instead of waiting for completion. Default behavior (omitted or "false") is synchronous. |
Body
| Field | Type | Required | Description |
|---|---|---|---|
templateId | string | Yes | The ID of the extraction template to use. |
fileName | string | Yes | Original file name of the document. |
pdfBase64 | string | Yes | Base64-encoded content of the PDF file. |
mimeType | string | Yes | MIME type of the file. Accepted values: application/pdf, application/vnd.openxmlformats-officedocument.wordprocessingml.document. |
runId | string | No | Optional identifier to group related extractions in a batch run. |
Code Examples
curl -X POST https://api.docmap.io/v1/extractions/run \
-H "Authorization: Bearer dm_live_abc123def456ghi789jkl012mno345" \
-H "Content-Type: application/json" \
-d '{
"templateId": "tmpl_8f3a2b1c4d5e6f7g",
"fileName": "invoice-2024-001.pdf",
"pdfBase64": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YW...",
"mimeType": "application/pdf"
}'const apiKey = process.env.DOCMAP_API_KEY
const response = await fetch('https://api.docmap.io/v1/extractions/run', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
templateId: 'tmpl_8f3a2b1c4d5e6f7g',
fileName: 'invoice-2024-001.pdf',
pdfBase64: pdfBuffer.toString('base64'),
mimeType: 'application/pdf',
}),
})
const { data } = await response.json()
console.log(data.extractedData)import requests
import base64
api_key = "dm_live_abc123def456ghi789jkl012mno345"
with open("invoice-2024-001.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.docmap.io/v1/extractions/run",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={
"templateId": "tmpl_8f3a2b1c4d5e6f7g",
"fileName": "invoice-2024-001.pdf",
"pdfBase64": pdf_base64,
"mimeType": "application/pdf",
},
)
data = response.json()["data"]
print(data["extractedData"])Response
Status: 200 OK
The response body is wrapped in a data object.
Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique extraction ID (prefixed with extract_). |
userId | string | ID of the user who owns this extraction. |
templateId | string | ID of the template used for extraction. |
templateName | string | Display name of the template used. |
fileName | string | Original file name of the uploaded document. |
status | "processing" | "completed" | "failed" | Current extraction status. "processing" when using async mode, "completed" on success, "failed" on error. |
extractedData | object | null | Extracted data matching the template fields. null if extraction failed. |
error | string | null | Error message describing the failure. null if extraction succeeded. |
variables | Variable[] | Array of template variable definitions used during extraction. |
source | "dashboard" | "api" | How the extraction was triggered. |
runId | string | null | Batch run ID, if one was provided in the request. |
processingTimeMs | number | null | Total processing duration in milliseconds. |
createdAt | string | ISO 8601 timestamp of when the extraction was created. |
Example
{
"data": {
"id": "extract_9k2m4n6p8q0r1s3t",
"userId": "uid_a1b2c3d4e5f6",
"templateId": "tmpl_8f3a2b1c4d5e6f7g",
"templateName": "Invoice Template",
"fileName": "invoice-2024-001.pdf",
"status": "completed",
"extractedData": {
"vendor_name": "Acme Corp",
"invoice_number": "INV-2024-001",
"invoice_date": "2024-11-15",
"total_amount": 1250.00,
"currency": "USD",
"line_items": [
{
"description": "Widget A",
"quantity": 10,
"unit_price": 125.00
}
]
},
"error": null,
"variables": [
{
"name": "vendor_name",
"type": "string",
"description": "Name of the vendor or supplier"
},
{
"name": "total_amount",
"type": "number",
"description": "Total invoice amount"
}
],
"source": "api",
"runId": null,
"processingTimeMs": 3842,
"createdAt": "2024-11-20T14:30:00.000Z"
}
}Async Mode
By default, the API waits for the extraction to finish before responding (synchronous). For long-running extractions, you can use async mode to get an immediate response and poll for the result.
Add ?async=true to the URL to enable async mode:
curl -X POST "https://api.docmap.io/v1/extractions/run?async=true" \
-H "Authorization: Bearer dm_live_abc123def456ghi789jkl012mno345" \
-H "Content-Type: application/json" \
-d '{
"templateId": "tmpl_8f3a2b1c4d5e6f7g",
"fileName": "invoice-2024-001.pdf",
"pdfBase64": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YW...",
"mimeType": "application/pdf"
}'Response: 202 Accepted
The response has the same shape as a synchronous response, but status will be "processing", extractedData will be null, and processingTimeMs will be null:
{
"data": {
"id": "extract_9k2m4n6p8q0r1s3t",
"userId": "uid_a1b2c3d4e5f6",
"templateId": "tmpl_8f3a2b1c4d5e6f7g",
"templateName": "Invoice Template",
"fileName": "invoice-2024-001.pdf",
"status": "processing",
"extractedData": null,
"error": null,
"variables": [...],
"source": "api",
"runId": null,
"processingTimeMs": null,
"createdAt": "2024-11-20T14:30:00.000Z"
}
}Use the returned id to poll the Get Extraction endpoint until status changes to "completed" or "failed".
WARNING
If an extraction stays in "processing" status for more than 2 minutes, treat it as failed. There is no automatic timeout -- in rare cases (e.g., server restart), a record may remain in "processing" indefinitely.
Errors
| Status | Code | Description |
|---|---|---|
401 | UNAUTHORIZED | Missing, invalid, or expired API key / token. |
404 | NOT_FOUND | The specified template was not found or does not belong to your account. |
429 | USAGE_LIMIT_EXCEEDED | You have reached your plan's monthly extraction limit. |
500 | INTERNAL_ERROR | An unexpected error occurred during extraction processing. |
