Running Extractions
Overview
The extraction endpoint accepts a document (base64-encoded), runs it through a template you have defined, and returns structured data matching the template's variables. A single API call handles the entire pipeline -- upload, AI processing, and structured output.
All extraction requests go through POST /v1/extractions/run.
Preparing Your Document
Before sending a document to the API, you need to base64-encode the file contents.
Constraints:
- Maximum request body size: 15 MB
- Supported MIME types:
application/pdf(PDF)application/vnd.openxmlformats-officedocument.wordprocessingml.document(DOCX)
# Base64-encode a PDF file
base64 -i invoice.pdf -o invoice_b64.txt
# Or inline (macOS / Linux)
PDF_BASE64=$(base64 -w 0 invoice.pdf)import { readFileSync } from "fs";
const pdfBuffer = readFileSync("invoice.pdf");
const pdfBase64 = pdfBuffer.toString("base64");import base64
with open("invoice.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode("utf-8")WARNING
Base64 encoding increases file size by roughly 33%. A 10 MB PDF becomes approximately 13.3 MB after encoding, so keep your source files under ~11 MB to stay within the 15 MB request limit.
Making the Request
Send a POST request to /v1/extractions/run with the following JSON body:
| Field | Type | Required | Description |
|---|---|---|---|
templateId | string | Yes | The ID of the template to extract with |
fileName | string | Yes | Original file name (e.g. "invoice.pdf") |
pdfBase64 | string | Yes | Base64-encoded file content |
mimeType | string | Yes | MIME type of the file |
runId | string | No | Optional identifier to group multiple extractions into a batch |
curl -X POST https://api.docmap.io/v1/extractions/run \
-H "Authorization: Bearer dm_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"templateId": "tmpl_abc123",
"fileName": "invoice.pdf",
"pdfBase64": "JVBERi0xLjQKJeLj...",
"mimeType": "application/pdf"
}'const response = await fetch("https://api.docmap.io/v1/extractions/run", {
method: "POST",
headers: {
Authorization: "Bearer dm_live_your_api_key",
"Content-Type": "application/json",
},
body: JSON.stringify({
templateId: "tmpl_abc123",
fileName: "invoice.pdf",
pdfBase64: pdfBase64,
mimeType: "application/pdf",
}),
});
const { data } = await response.json();
console.log(data.extractedData);import requests
response = requests.post(
"https://api.docmap.io/v1/extractions/run",
headers={
"Authorization": "Bearer dm_live_your_api_key",
"Content-Type": "application/json",
},
json={
"templateId": "tmpl_abc123",
"fileName": "invoice.pdf",
"pdfBase64": pdf_base64,
"mimeType": "application/pdf",
},
)
data = response.json()["data"]
print(data["extractedData"])Understanding the Response
A successful extraction returns a response wrapped in a data object:
{
"data": {
"id": "ext_abc123def456",
"userId": "user_789",
"templateId": "tmpl_abc123",
"templateName": "Invoice Parser",
"fileName": "invoice.pdf",
"status": "completed",
"extractedData": {
"vendor_name": "Acme Corp",
"invoice_number": "INV-2024-001",
"total_amount": 1250.00,
"line_items": [
{ "description": "Widget A", "quantity": 10, "unit_price": 125.00 }
]
},
"error": null,
"variables": [
{ "name": "vendor_name", "type": "string", "description": "Company name of the vendor" },
{ "name": "invoice_number", "type": "string", "description": "Invoice reference number" },
{ "name": "total_amount", "type": "number", "description": "Total invoice amount" },
{ "name": "line_items", "type": "array", "description": "List of line items" }
],
"source": "api",
"runId": null,
"processingTimeMs": 3420,
"createdAt": "2025-07-15T10:30:00.000Z"
}
}| Field | Description |
|---|---|
id | Unique extraction ID (prefixed with ext_) |
status | "processing" while running (async mode), "completed" if data was successfully extracted, "failed" if processing encountered an error |
extractedData | An object whose keys match your template's variable names. null if the extraction failed |
error | Error message string if the extraction failed. null on success |
processingTimeMs | Time the AI took to process the document, in milliseconds |
source | "api" when triggered via API key, "dashboard" when triggered from the web UI |
variables | The template variable definitions that were used for this extraction |
runId | The batch identifier you provided, or null if none was specified |
templateName | Human-readable name of the template used |
createdAt | ISO 8601 timestamp of when the extraction was created |
TIP
Always check the status field before accessing extractedData. If status is "failed", the error field contains a description of what went wrong.
Batch Extractions
To process multiple files as a logical batch, pass the same runId to each extraction request. This does not change how documents are processed -- each file is still extracted independently -- but it lets you query all results from a batch together.
const runId = "batch-invoices-2025-07";
const files = ["invoice-001.pdf", "invoice-002.pdf", "invoice-003.pdf"];
// Process each file with the same runId
const results = await Promise.all(
files.map(async (fileName) => {
const pdfBase64 = readFileSync(fileName).toString("base64");
const response = await fetch("https://api.docmap.io/v1/extractions/run", {
method: "POST",
headers: {
Authorization: "Bearer dm_live_your_api_key",
"Content-Type": "application/json",
},
body: JSON.stringify({
templateId: "tmpl_abc123",
fileName,
pdfBase64,
mimeType: "application/pdf",
runId,
}),
});
return response.json();
})
);Then retrieve all extractions from the batch:
curl "https://api.docmap.io/v1/extractions?runId=batch-invoices-2025-07" \
-H "Authorization: Bearer dm_live_your_api_key"const response = await fetch(
"https://api.docmap.io/v1/extractions?runId=batch-invoices-2025-07",
{
headers: { Authorization: "Bearer dm_live_your_api_key" },
}
);
const { data } = await response.json();
console.log(`Batch contains ${data.length} extractions`);response = requests.get(
"https://api.docmap.io/v1/extractions",
params={"runId": "batch-invoices-2025-07"},
headers={"Authorization": "Bearer dm_live_your_api_key"},
)
data = response.json()["data"]
print(f"Batch contains {len(data)} extractions")TIP
The list endpoint also supports filtering by templateId and a limit parameter (1--100, default 50). You can combine filters: ?runId=batch-001&templateId=tmpl_abc123&limit=100.
Async Workflow
By default, extraction requests are synchronous -- the API blocks until processing finishes. For long-running extractions or when you want to avoid HTTP timeouts, use async mode by adding ?async=true to the URL. The API returns immediately with a "processing" status, and you poll a separate endpoint until the result is ready.
Submit + Poll Pattern
import { readFileSync } from "fs";
const API_BASE = "https://api.docmap.io";
const API_KEY = process.env.DOCMAP_API_KEY!;
// 1. Submit the extraction asynchronously
const submitResponse = await fetch(`${API_BASE}/v1/extractions/run?async=true`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
templateId: "tmpl_abc123",
fileName: "invoice.pdf",
pdfBase64: readFileSync("invoice.pdf").toString("base64"),
mimeType: "application/pdf",
}),
});
const { data: submitted } = await submitResponse.json();
console.log(`Extraction ${submitted.id} submitted, status: ${submitted.status}`);
// 2. Poll until complete
async function poll(extractionId: string): Promise<any> {
for (let i = 0; i < 30; i++) {
const res = await fetch(`${API_BASE}/v1/extractions/${extractionId}`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const { data } = await res.json();
if (data.status !== "processing") return data;
await new Promise((r) => setTimeout(r, 2000));
}
throw new Error("Extraction timed out");
}
const result = await poll(submitted.id);
console.log(`Final status: ${result.status}`);
console.log("Extracted data:", result.extractedData);import base64
import time
import requests
API_BASE = "https://api.docmap.io"
API_KEY = "dm_live_your_api_key"
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
# 1. Submit the extraction asynchronously
with open("invoice.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode("utf-8")
submit_response = requests.post(
f"{API_BASE}/v1/extractions/run?async=true",
headers=headers,
json={
"templateId": "tmpl_abc123",
"fileName": "invoice.pdf",
"pdfBase64": pdf_base64,
"mimeType": "application/pdf",
},
)
submitted = submit_response.json()["data"]
print(f"Extraction {submitted['id']} submitted, status: {submitted['status']}")
# 2. Poll until complete
def poll(extraction_id: str):
for _ in range(30):
res = requests.get(
f"{API_BASE}/v1/extractions/{extraction_id}",
headers={"Authorization": f"Bearer {API_KEY}"},
)
data = res.json()["data"]
if data["status"] != "processing":
return data
time.sleep(2)
raise TimeoutError("Extraction timed out")
result = poll(submitted["id"])
print(f"Final status: {result['status']}")
print("Extracted data:", result["extractedData"])When to use async mode
Use async mode when processing large documents, when your HTTP client has short timeouts, or when you want to submit multiple extractions and collect results later. For most single-document extractions, synchronous mode is simpler.
Complete Workflow Example
Here is a full end-to-end example in TypeScript that reads a PDF from disk, runs an extraction, and handles both success and failure:
import { readFileSync } from "fs";
const API_BASE = "https://api.docmap.io";
const API_KEY = process.env.DOCMAP_API_KEY!;
async function extractInvoice(filePath: string) {
// 1. Read and encode the PDF
const pdfBuffer = readFileSync(filePath);
const pdfBase64 = pdfBuffer.toString("base64");
const fileName = filePath.split("/").pop()!;
console.log(`Processing ${fileName} (${(pdfBuffer.length / 1024).toFixed(0)} KB)...`);
// 2. Run the extraction
const response = await fetch(`${API_BASE}/v1/extractions/run`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
templateId: "tmpl_invoice_parser",
fileName,
pdfBase64,
mimeType: "application/pdf",
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(`API error ${response.status}: ${error.error.message}`);
}
const { data } = await response.json();
// 3. Check the extraction result
if (data.status === "failed") {
console.error(`Extraction failed: ${data.error}`);
return null;
}
console.log(`Extraction completed in ${data.processingTimeMs}ms`);
console.log("Extracted data:", JSON.stringify(data.extractedData, null, 2));
return data.extractedData;
}
// Run it
extractInvoice("./invoices/invoice-001.pdf")
.then((result) => {
if (result) {
console.log(`Vendor: ${result.vendor_name}`);
console.log(`Total: $${result.total_amount}`);
}
})
.catch(console.error);