Skip to content

Run Extraction

POST /v1/extractions/run

Run an extraction on a PDF document. The document is processed using the specified template, and the extracted data is returned as structured JSON matching the template's field definitions.

Try it

Test this endpoint interactively in the Swagger UI.

Authorization required

Include your API key in the Authorization header.

Request

Headers

HeaderValueRequired
AuthorizationBearer <token>Yes
Content-Typeapplication/jsonYes

Query Parameters

ParamTypeRequiredDescription
asyncstringNoSet to "true" to return immediately with a processing status instead of waiting for completion. Default behavior (omitted or "false") is synchronous.

Body

FieldTypeRequiredDescription
templateIdstringYesThe ID of the extraction template to use.
fileNamestringYesOriginal file name of the document.
pdfBase64stringYesBase64-encoded content of the PDF file.
mimeTypestringYesMIME type of the file. Accepted values: application/pdf, application/vnd.openxmlformats-officedocument.wordprocessingml.document.
runIdstringNoOptional identifier to group related extractions in a batch run.

Code Examples

bash
curl -X POST https://api.docmap.io/v1/extractions/run \
  -H "Authorization: Bearer dm_live_abc123def456ghi789jkl012mno345" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "fileName": "invoice-2024-001.pdf",
    "pdfBase64": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YW...",
    "mimeType": "application/pdf"
  }'
typescript
const apiKey = process.env.DOCMAP_API_KEY

const response = await fetch('https://api.docmap.io/v1/extractions/run', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    templateId: 'tmpl_8f3a2b1c4d5e6f7g',
    fileName: 'invoice-2024-001.pdf',
    pdfBase64: pdfBuffer.toString('base64'),
    mimeType: 'application/pdf',
  }),
})

const { data } = await response.json()
console.log(data.extractedData)
python
import requests
import base64

api_key = "dm_live_abc123def456ghi789jkl012mno345"

with open("invoice-2024-001.pdf", "rb") as f:
    pdf_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.docmap.io/v1/extractions/run",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json={
        "templateId": "tmpl_8f3a2b1c4d5e6f7g",
        "fileName": "invoice-2024-001.pdf",
        "pdfBase64": pdf_base64,
        "mimeType": "application/pdf",
    },
)

data = response.json()["data"]
print(data["extractedData"])

Response

Status: 200 OK

The response body is wrapped in a data object.

Fields

FieldTypeDescription
idstringUnique extraction ID (prefixed with extract_).
userIdstringID of the user who owns this extraction.
templateIdstringID of the template used for extraction.
templateNamestringDisplay name of the template used.
fileNamestringOriginal file name of the uploaded document.
status"processing" | "completed" | "failed"Current extraction status. "processing" when using async mode, "completed" on success, "failed" on error.
extractedDataobject | nullExtracted data matching the template fields. null if extraction failed.
errorstring | nullError message describing the failure. null if extraction succeeded.
variablesVariable[]Array of template variable definitions used during extraction.
source"dashboard" | "api"How the extraction was triggered.
runIdstring | nullBatch run ID, if one was provided in the request.
processingTimeMsnumber | nullTotal processing duration in milliseconds.
createdAtstringISO 8601 timestamp of when the extraction was created.

Example

json
{
  "data": {
    "id": "extract_9k2m4n6p8q0r1s3t",
    "userId": "uid_a1b2c3d4e5f6",
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "templateName": "Invoice Template",
    "fileName": "invoice-2024-001.pdf",
    "status": "completed",
    "extractedData": {
      "vendor_name": "Acme Corp",
      "invoice_number": "INV-2024-001",
      "invoice_date": "2024-11-15",
      "total_amount": 1250.00,
      "currency": "USD",
      "line_items": [
        {
          "description": "Widget A",
          "quantity": 10,
          "unit_price": 125.00
        }
      ]
    },
    "error": null,
    "variables": [
      {
        "name": "vendor_name",
        "type": "string",
        "description": "Name of the vendor or supplier"
      },
      {
        "name": "total_amount",
        "type": "number",
        "description": "Total invoice amount"
      }
    ],
    "source": "api",
    "runId": null,
    "processingTimeMs": 3842,
    "createdAt": "2024-11-20T14:30:00.000Z"
  }
}

Async Mode

By default, the API waits for the extraction to finish before responding (synchronous). For long-running extractions, you can use async mode to get an immediate response and poll for the result.

Add ?async=true to the URL to enable async mode:

bash
curl -X POST "https://api.docmap.io/v1/extractions/run?async=true" \
  -H "Authorization: Bearer dm_live_abc123def456ghi789jkl012mno345" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "fileName": "invoice-2024-001.pdf",
    "pdfBase64": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YW...",
    "mimeType": "application/pdf"
  }'

Response: 202 Accepted

The response has the same shape as a synchronous response, but status will be "processing", extractedData will be null, and processingTimeMs will be null:

json
{
  "data": {
    "id": "extract_9k2m4n6p8q0r1s3t",
    "userId": "uid_a1b2c3d4e5f6",
    "templateId": "tmpl_8f3a2b1c4d5e6f7g",
    "templateName": "Invoice Template",
    "fileName": "invoice-2024-001.pdf",
    "status": "processing",
    "extractedData": null,
    "error": null,
    "variables": [...],
    "source": "api",
    "runId": null,
    "processingTimeMs": null,
    "createdAt": "2024-11-20T14:30:00.000Z"
  }
}

Use the returned id to poll the Get Extraction endpoint until status changes to "completed" or "failed".

WARNING

If an extraction stays in "processing" status for more than 2 minutes, treat it as failed. There is no automatic timeout -- in rare cases (e.g., server restart), a record may remain in "processing" indefinitely.

Errors

StatusCodeDescription
401UNAUTHORIZEDMissing, invalid, or expired API key / token.
404NOT_FOUNDThe specified template was not found or does not belong to your account.
429USAGE_LIMIT_EXCEEDEDYou have reached your plan's monthly extraction limit.
500INTERNAL_ERRORAn unexpected error occurred during extraction processing.

DocMap API Documentation