Run Extraction

POST /v1/extractions/run

Run an extraction on a PDF document. The document is processed using the specified template, and the extracted data is returned as structured JSON matching the template's field definitions.

Try it

Test this endpoint interactively in the Swagger UI.

Authorization required

Include your API key in the Authorization header.

Request

Headers

Header	Value	Required
`Authorization`	`Bearer <token>`	Yes
`Content-Type`	`application/json`	Yes

Query Parameters

Param	Type	Required	Description
`async`	`string`	No	Set to `"true"` to return immediately with a `processing` status instead of waiting for completion. Default behavior (omitted or `"false"`) is synchronous.

Body

Field	Type	Required	Description
`templateId`	`string`	Yes	The ID of the extraction template to use.
`fileName`	`string`	Yes	Original file name of the document.
`pdfBase64`	`string`	Yes	Base64-encoded content of the PDF file.
`mimeType`	`string`	Yes	MIME type of the file. Accepted values: `application/pdf`, `application/vnd.openxmlformats-officedocument.wordprocessingml.document`.
`runId`	`string`	No	Optional identifier to group related extractions in a batch run.

Code Examples

curlTypeScriptPython

bash

curl -X POST https://api.docmap.io/v1/extractions/run \
  -H "Authorization: Bearer dm_live_abc123def456ghi789jkl012mno345" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "fileName": "invoice-2024-001.pdf",
    "pdfBase64": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YW...",
    "mimeType": "application/pdf"
  }'

typescript

const apiKey = process.env.DOCMAP_API_KEY

const response = await fetch('https://api.docmap.io/v1/extractions/run', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    templateId: 'tmpl_8f3a2b1c4d5e6f7g',
    fileName: 'invoice-2024-001.pdf',
    pdfBase64: pdfBuffer.toString('base64'),
    mimeType: 'application/pdf',
  }),
})

const { data } = await response.json()
console.log(data.extractedData)

python

import requests
import base64

api_key = "dm_live_abc123def456ghi789jkl012mno345"

with open("invoice-2024-001.pdf", "rb") as f:
    pdf_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.docmap.io/v1/extractions/run",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json={
        "templateId": "tmpl_8f3a2b1c4d5e6f7g",
        "fileName": "invoice-2024-001.pdf",
        "pdfBase64": pdf_base64,
        "mimeType": "application/pdf",
    },
)

data = response.json()["data"]
print(data["extractedData"])

Response

Status: 200 OK

The response body is wrapped in a data object.

Fields

Field	Type	Description
`id`	`string`	Unique extraction ID (prefixed with `extract_`).
`userId`	`string`	ID of the user who owns this extraction.
`templateId`	`string`	ID of the template used for extraction.
`templateName`	`string`	Display name of the template used.
`fileName`	`string`	Original file name of the uploaded document.
`status`	`"processing"` \| `"completed"` \| `"failed"`	Current extraction status. `"processing"` when using async mode, `"completed"` on success, `"failed"` on error.
`extractedData`	`object` \| `null`	Extracted data matching the template fields. `null` if extraction failed.
`error`	`string` \| `null`	Error message describing the failure. `null` if extraction succeeded.
`variables`	`Variable[]`	Array of template variable definitions used during extraction.
`source`	`"dashboard"` \| `"api"`	How the extraction was triggered.
`runId`	`string` \| `null`	Batch run ID, if one was provided in the request.
`processingTimeMs`	`number` \| `null`	Total processing duration in milliseconds.
`createdAt`	`string`	ISO 8601 timestamp of when the extraction was created.

Example

json

{
  "data": {
    "id": "extract_9k2m4n6p8q0r1s3t",
    "userId": "uid_a1b2c3d4e5f6",
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "templateName": "Invoice Template",
    "fileName": "invoice-2024-001.pdf",
    "status": "completed",
    "extractedData": {
      "vendor_name": "Acme Corp",
      "invoice_number": "INV-2024-001",
      "invoice_date": "2024-11-15",
      "total_amount": 1250.00,
      "currency": "USD",
      "line_items": [
        {
          "description": "Widget A",
          "quantity": 10,
          "unit_price": 125.00
        }
      ]
    },
    "error": null,
    "variables": [
      {
        "name": "vendor_name",
        "type": "string",
        "description": "Name of the vendor or supplier"
      },
      {
        "name": "total_amount",
        "type": "number",
        "description": "Total invoice amount"
      }
    ],
    "source": "api",
    "runId": null,
    "processingTimeMs": 3842,
    "createdAt": "2024-11-20T14:30:00.000Z"
  }
}

Async Mode

By default, the API waits for the extraction to finish before responding (synchronous). For long-running extractions, you can use async mode to get an immediate response and poll for the result.

Add ?async=true to the URL to enable async mode:

bash

curl -X POST "https://api.docmap.io/v1/extractions/run?async=true" \
  -H "Authorization: Bearer dm_live_abc123def456ghi789jkl012mno345" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "fileName": "invoice-2024-001.pdf",
    "pdfBase64": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YW...",
    "mimeType": "application/pdf"
  }'

Response: 202 Accepted

The response has the same shape as a synchronous response, but status will be "processing", extractedData will be null, and processingTimeMs will be null:

json

{
  "data": {
    "id": "extract_9k2m4n6p8q0r1s3t",
    "userId": "uid_a1b2c3d4e5f6",
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "templateName": "Invoice Template",
    "fileName": "invoice-2024-001.pdf",
    "status": "processing",
    "extractedData": null,
    "error": null,
    "variables": [...],
    "source": "api",
    "runId": null,
    "processingTimeMs": null,
    "createdAt": "2024-11-20T14:30:00.000Z"
  }
}

Use the returned id to poll the Get Extraction endpoint until status changes to "completed" or "failed".

WARNING

If an extraction stays in "processing" status for more than 2 minutes, treat it as failed. There is no automatic timeout -- in rare cases (e.g., server restart), a record may remain in "processing" indefinitely.

Errors

Status	Code	Description
`401`	`UNAUTHORIZED`	Missing, invalid, or expired API key / token.
`404`	`NOT_FOUND`	The specified template was not found or does not belong to your account.
`429`	`USAGE_LIMIT_EXCEEDED`	You have reached your plan's monthly extraction limit.
`500`	`INTERNAL_ERROR`	An unexpected error occurred during extraction processing.

Run Extraction ​

Request ​

Headers ​

Query Parameters ​

Body ​

Code Examples ​

Response ​

Fields ​

Example ​

Async Mode ​

Errors ​