Running Extractions

Overview

The extraction endpoint accepts a document (base64-encoded), runs it through a template you have defined, and returns structured data matching the template's variables. A single API call handles the entire pipeline -- upload, AI processing, and structured output.

All extraction requests go through POST /v1/extractions/run.

Preparing Your Document

Before sending a document to the API, you need to base64-encode the file contents.

Constraints:

Maximum request body size: 15 MB
Supported MIME types:
- application/pdf (PDF)
- application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)

BashTypeScriptPython

bash

# Base64-encode a PDF file
base64 -i invoice.pdf -o invoice_b64.txt

# Or inline (macOS / Linux)
PDF_BASE64=$(base64 -w 0 invoice.pdf)

typescript

import { readFileSync } from "fs";

const pdfBuffer = readFileSync("invoice.pdf");
const pdfBase64 = pdfBuffer.toString("base64");

python

import base64

with open("invoice.pdf", "rb") as f:
    pdf_base64 = base64.b64encode(f.read()).decode("utf-8")

WARNING

Base64 encoding increases file size by roughly 33%. A 10 MB PDF becomes approximately 13.3 MB after encoding, so keep your source files under ~11 MB to stay within the 15 MB request limit.

Making the Request

Send a POST request to /v1/extractions/run with the following JSON body:

Field	Type	Required	Description
`templateId`	string	Yes	The ID of the template to extract with
`fileName`	string	Yes	Original file name (e.g. `"invoice.pdf"`)
`pdfBase64`	string	Yes	Base64-encoded file content
`mimeType`	string	Yes	MIME type of the file
`runId`	string	No	Optional identifier to group multiple extractions into a batch

curlTypeScriptPython

bash

curl -X POST https://api.docmap.io/v1/extractions/run \
  -H "Authorization: Bearer dm_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "tmpl_abc123",
    "fileName": "invoice.pdf",
    "pdfBase64": "JVBERi0xLjQKJeLj...",
    "mimeType": "application/pdf"
  }'

typescript

const response = await fetch("https://api.docmap.io/v1/extractions/run", {
  method: "POST",
  headers: {
    Authorization: "Bearer dm_live_your_api_key",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    templateId: "tmpl_abc123",
    fileName: "invoice.pdf",
    pdfBase64: pdfBase64,
    mimeType: "application/pdf",
  }),
});

const { data } = await response.json();
console.log(data.extractedData);

python

import requests

response = requests.post(
    "https://api.docmap.io/v1/extractions/run",
    headers={
        "Authorization": "Bearer dm_live_your_api_key",
        "Content-Type": "application/json",
    },
    json={
        "templateId": "tmpl_abc123",
        "fileName": "invoice.pdf",
        "pdfBase64": pdf_base64,
        "mimeType": "application/pdf",
    },
)

data = response.json()["data"]
print(data["extractedData"])

Understanding the Response

A successful extraction returns a response wrapped in a data object:

json

{
  "data": {
    "id": "ext_abc123def456",
    "userId": "user_789",
    "templateId": "tmpl_abc123",
    "templateName": "Invoice Parser",
    "fileName": "invoice.pdf",
    "status": "completed",
    "extractedData": {
      "vendor_name": "Acme Corp",
      "invoice_number": "INV-2024-001",
      "total_amount": 1250.00,
      "line_items": [
        { "description": "Widget A", "quantity": 10, "unit_price": 125.00 }
      ]
    },
    "error": null,
    "variables": [
      { "name": "vendor_name", "type": "string", "description": "Company name of the vendor" },
      { "name": "invoice_number", "type": "string", "description": "Invoice reference number" },
      { "name": "total_amount", "type": "number", "description": "Total invoice amount" },
      { "name": "line_items", "type": "array", "description": "List of line items" }
    ],
    "source": "api",
    "runId": null,
    "processingTimeMs": 3420,
    "createdAt": "2025-07-15T10:30:00.000Z"
  }
}

Field	Description
`id`	Unique extraction ID (prefixed with `ext_`)
`status`	`"processing"` while running (async mode), `"completed"` if data was successfully extracted, `"failed"` if processing encountered an error
`extractedData`	An object whose keys match your template's variable names. `null` if the extraction failed
`error`	Error message string if the extraction failed. `null` on success
`processingTimeMs`	Time the AI took to process the document, in milliseconds
`source`	`"api"` when triggered via API key, `"dashboard"` when triggered from the web UI
`variables`	The template variable definitions that were used for this extraction
`runId`	The batch identifier you provided, or `null` if none was specified
`templateName`	Human-readable name of the template used
`createdAt`	ISO 8601 timestamp of when the extraction was created

TIP

Always check the status field before accessing extractedData. If status is "failed", the error field contains a description of what went wrong.

Batch Extractions

To process multiple files as a logical batch, pass the same runId to each extraction request. This does not change how documents are processed -- each file is still extracted independently -- but it lets you query all results from a batch together.

typescript

const runId = "batch-invoices-2025-07";
const files = ["invoice-001.pdf", "invoice-002.pdf", "invoice-003.pdf"];

// Process each file with the same runId
const results = await Promise.all(
  files.map(async (fileName) => {
    const pdfBase64 = readFileSync(fileName).toString("base64");

    const response = await fetch("https://api.docmap.io/v1/extractions/run", {
      method: "POST",
      headers: {
        Authorization: "Bearer dm_live_your_api_key",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        templateId: "tmpl_abc123",
        fileName,
        pdfBase64,
        mimeType: "application/pdf",
        runId,
      }),
    });

    return response.json();
  })
);

Then retrieve all extractions from the batch:

curlTypeScriptPython

bash

curl "https://api.docmap.io/v1/extractions?runId=batch-invoices-2025-07" \
  -H "Authorization: Bearer dm_live_your_api_key"

typescript

const response = await fetch(
  "https://api.docmap.io/v1/extractions?runId=batch-invoices-2025-07",
  {
    headers: { Authorization: "Bearer dm_live_your_api_key" },
  }
);

const { data } = await response.json();
console.log(`Batch contains ${data.length} extractions`);

python

response = requests.get(
    "https://api.docmap.io/v1/extractions",
    params={"runId": "batch-invoices-2025-07"},
    headers={"Authorization": "Bearer dm_live_your_api_key"},
)

data = response.json()["data"]
print(f"Batch contains {len(data)} extractions")

TIP

The list endpoint also supports filtering by templateId and a limit parameter (1--100, default 50). You can combine filters: ?runId=batch-001&templateId=tmpl_abc123&limit=100.

Async Workflow

By default, extraction requests are synchronous -- the API blocks until processing finishes. For long-running extractions or when you want to avoid HTTP timeouts, use async mode by adding ?async=true to the URL. The API returns immediately with a "processing" status, and you poll a separate endpoint until the result is ready.

Submit + Poll Pattern

TypeScriptPython

typescript

import { readFileSync } from "fs";

const API_BASE = "https://api.docmap.io";
const API_KEY = process.env.DOCMAP_API_KEY!;

// 1. Submit the extraction asynchronously
const submitResponse = await fetch(`${API_BASE}/v1/extractions/run?async=true`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    templateId: "tmpl_abc123",
    fileName: "invoice.pdf",
    pdfBase64: readFileSync("invoice.pdf").toString("base64"),
    mimeType: "application/pdf",
  }),
});

const { data: submitted } = await submitResponse.json();
console.log(`Extraction ${submitted.id} submitted, status: ${submitted.status}`);

// 2. Poll until complete
async function poll(extractionId: string): Promise<any> {
  for (let i = 0; i < 30; i++) {
    const res = await fetch(`${API_BASE}/v1/extractions/${extractionId}`, {
      headers: { Authorization: `Bearer ${API_KEY}` },
    });
    const { data } = await res.json();
    if (data.status !== "processing") return data;
    await new Promise((r) => setTimeout(r, 2000));
  }
  throw new Error("Extraction timed out");
}

const result = await poll(submitted.id);
console.log(`Final status: ${result.status}`);
console.log("Extracted data:", result.extractedData);

python

import base64
import time
import requests

API_BASE = "https://api.docmap.io"
API_KEY = "dm_live_your_api_key"
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

# 1. Submit the extraction asynchronously
with open("invoice.pdf", "rb") as f:
    pdf_base64 = base64.b64encode(f.read()).decode("utf-8")

submit_response = requests.post(
    f"{API_BASE}/v1/extractions/run?async=true",
    headers=headers,
    json={
        "templateId": "tmpl_abc123",
        "fileName": "invoice.pdf",
        "pdfBase64": pdf_base64,
        "mimeType": "application/pdf",
    },
)

submitted = submit_response.json()["data"]
print(f"Extraction {submitted['id']} submitted, status: {submitted['status']}")

# 2. Poll until complete
def poll(extraction_id: str):
    for _ in range(30):
        res = requests.get(
            f"{API_BASE}/v1/extractions/{extraction_id}",
            headers={"Authorization": f"Bearer {API_KEY}"},
        )
        data = res.json()["data"]
        if data["status"] != "processing":
            return data
        time.sleep(2)
    raise TimeoutError("Extraction timed out")

result = poll(submitted["id"])
print(f"Final status: {result['status']}")
print("Extracted data:", result["extractedData"])

When to use async mode

Use async mode when processing large documents, when your HTTP client has short timeouts, or when you want to submit multiple extractions and collect results later. For most single-document extractions, synchronous mode is simpler.

Complete Workflow Example

Here is a full end-to-end example in TypeScript that reads a PDF from disk, runs an extraction, and handles both success and failure:

typescript

import { readFileSync } from "fs";

const API_BASE = "https://api.docmap.io";
const API_KEY = process.env.DOCMAP_API_KEY!;

async function extractInvoice(filePath: string) {
  // 1. Read and encode the PDF
  const pdfBuffer = readFileSync(filePath);
  const pdfBase64 = pdfBuffer.toString("base64");
  const fileName = filePath.split("/").pop()!;

  console.log(`Processing ${fileName} (${(pdfBuffer.length / 1024).toFixed(0)} KB)...`);

  // 2. Run the extraction
  const response = await fetch(`${API_BASE}/v1/extractions/run`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      templateId: "tmpl_invoice_parser",
      fileName,
      pdfBase64,
      mimeType: "application/pdf",
    }),
  });

  if (!response.ok) {
    const error = await response.json();
    throw new Error(`API error ${response.status}: ${error.error.message}`);
  }

  const { data } = await response.json();

  // 3. Check the extraction result
  if (data.status === "failed") {
    console.error(`Extraction failed: ${data.error}`);
    return null;
  }

  console.log(`Extraction completed in ${data.processingTimeMs}ms`);
  console.log("Extracted data:", JSON.stringify(data.extractedData, null, 2));

  return data.extractedData;
}

// Run it
extractInvoice("./invoices/invoice-001.pdf")
  .then((result) => {
    if (result) {
      console.log(`Vendor: ${result.vendor_name}`);
      console.log(`Total: $${result.total_amount}`);
    }
  })
  .catch(console.error);

Running Extractions ​

Overview ​

Preparing Your Document ​

Making the Request ​

Understanding the Response ​

Batch Extractions ​

Async Workflow ​

Submit + Poll Pattern ​

Complete Workflow Example ​

Running Extractions

Overview

Preparing Your Document

Making the Request

Understanding the Response

Batch Extractions

Async Workflow

Submit + Poll Pattern

Complete Workflow Example