Skip to content

数据模型

本页面记录 DocMap API 返回的数据模型。所有响应使用 JSON 格式。

ExtractionRecord

提取记录表示一次文档提取 -- 即使用模板处理 PDF 的结果。

typescript
interface ExtractionRecord {
  id: string              // Unique extraction ID
  userId: string          // Owner user ID
  templateId: string      // Template used for extraction
  templateName: string    // Human-readable template name
  fileName: string        // Original uploaded file name
  status: 'processing' | 'completed' | 'failed'
  extractedData: Record<string, unknown> | null  // Structured data or null if failed
  error: string | null    // Error message if failed, null if successful
  variables: Variable[]   // Template variable definitions used
  source: 'dashboard' | 'api'  // How the extraction was initiated
  runId: string | null    // Batch run ID, null if not part of a batch
  processingTimeMs: number | null  // Processing duration in milliseconds
  createdAt: string       // ISO 8601 timestamp
}

字段描述

字段类型描述
idstring提取记录的唯一标识符。
userIdstring拥有此提取的用户 ID。
templateIdstring用于定义提取模式的模板 ID。
templateNamestring提取时模板的显示名称。
fileNamestring上传文件的原始名称(例如 "invoice-march.pdf")。
statusstring提取运行中(异步模式)为 "processing",提取成功为 "completed",失败为 "failed"
extractedDataobject | null从文档中提取的结构化数据,与模板的变量定义匹配。提取失败时为 null
errorstring | null错误描述。提取成功时为 null
variablesVariable[]用于此次提取的模板变量定义。参见 Variable
sourcestring通过网页界面运行时为 "dashboard",通过 API 运行时为 "api"
runIdstring | null如果提取属于批量运行的一部分,则为共享的运行 ID。单独提取时为 null
processingTimeMsnumber | null提取耗时(毫秒)。未记录时为 null
createdAtstring提取创建时间的 ISO 8601 时间戳。

示例

json
{
  "id": "ext_kp92mf7x",
  "userId": "u_abc123",
  "templateId": "tpl_inv001",
  "templateName": "Standard Invoice",
  "fileName": "acme-invoice-2024-003.pdf",
  "status": "completed",
  "extractedData": {
    "vendor_name": "Acme Corp",
    "invoice_number": "INV-2024-003",
    "invoice_date": "2024-11-15",
    "due_date": "2024-12-15",
    "total_amount": 4750.00,
    "currency": "USD",
    "line_items": [
      {
        "description": "Cloud Hosting - Pro Plan",
        "quantity": 1,
        "unit_price": 2500.00
      },
      {
        "description": "SSL Certificate (Wildcard)",
        "quantity": 3,
        "unit_price": 750.00
      }
    ]
  },
  "error": null,
  "variables": [
    { "key": "vendor_name", "type": "string", "context": "Name of the vendor or supplier", "required": true },
    { "key": "invoice_number", "type": "string", "context": "Invoice or reference number" },
    { "key": "invoice_date", "type": "date", "context": "Date the invoice was issued" },
    { "key": "due_date", "type": "date", "context": "Payment due date" },
    { "key": "total_amount", "type": "number", "context": "Total amount due" },
    { "key": "currency", "type": "string", "context": "Currency code (e.g. USD, EUR)" },
    {
      "key": "line_items",
      "type": "array",
      "context": "Individual line items on the invoice",
      "arrayType": "object",
      "fields": [
        { "key": "description", "type": "string", "context": "Item description" },
        { "key": "quantity", "type": "number", "context": "Quantity ordered" },
        { "key": "unit_price", "type": "number", "context": "Price per unit" }
      ]
    }
  ],
  "source": "api",
  "runId": null,
  "processingTimeMs": 3842,
  "createdAt": "2024-11-20T14:30:00.000Z"
}

ApiKeyPublic

API 密钥的公开投影。创建后不会再返回实际密钥值及其哈希。

typescript
interface ApiKeyPublic {
  id: string              // Key ID
  userId: string          // Owner user ID
  name: string            // Label for the key
  prefix: string          // First 16 characters of the key (e.g., "dm_live_a1b2c3d4")
  expiresAt: string | null  // ISO 8601 expiration, null if never expires
  lastUsedAt: string | null // ISO 8601 last usage, null if never used
  createdAt: string       // ISO 8601 timestamp
  revoked: boolean        // Whether the key has been revoked
}

字段描述

字段类型描述
idstringAPI 密钥的唯一标识符。
userIdstring拥有该密钥的用户 ID。
namestring密钥的可读标签(例如 "Production Server")。
prefixstring密钥的前 16 个字符,用于在不暴露完整值的情况下识别正在使用的密钥。
expiresAtstring | null密钥过期时间的 ISO 8601 时间戳。如果创建时未设置过期则为 null
lastUsedAtstring | null密钥最后一次用于认证请求的 ISO 8601 时间戳。如果从未使用则为 null
createdAtstring密钥创建时间的 ISO 8601 时间戳。
revokedboolean密钥是否已被吊销。在列表响应中,已吊销的密钥会被过滤掉,因此通常为 false

示例

json
{
  "id": "key_m3x9kq72",
  "userId": "u_abc123",
  "name": "Production Server",
  "prefix": "dm_live_a1b2c3d4",
  "expiresAt": "2025-03-15T00:00:00.000Z",
  "lastUsedAt": "2025-01-10T09:22:14.000Z",
  "createdAt": "2024-12-15T10:00:00.000Z",
  "revoked": false
}

WebhookPublic

Webhook 的公开投影。创建后不会再返回签名密钥。

typescript
interface WebhookPublic {
  id: string              // Webhook ID
  userId: string          // Owner user ID
  url: string             // Registered endpoint URL
  events: WebhookEvent[]  // Subscribed events
  active: boolean         // Whether the webhook is active
  createdAt: string       // ISO 8601 timestamp
}

type WebhookEvent = 'extraction.completed' | 'extraction.failed'

字段描述

字段类型描述
idstringWebhook 的唯一标识符。
userIdstring拥有该 Webhook 的用户 ID。
urlstring接收 Webhook POST 请求的 HTTPS 端点。
eventsstring[]该 Webhook 订阅的事件。可选值:"extraction.completed""extraction.failed"
activebooleanWebhook 是否处于活跃状态。在列表响应中,仅返回活跃的 Webhook,因此始终为 true
createdAtstringWebhook 创建时间的 ISO 8601 时间戳。

示例

json
{
  "id": "webhook-abc123def456",
  "userId": "uid_a1b2c3d4e5f6",
  "url": "https://your-app.com/webhooks/docmap",
  "events": ["extraction.completed", "extraction.failed"],
  "active": true,
  "createdAt": "2025-01-15T10:00:00.000Z"
}

Variable

变量定义了提取模板中的单个字段。变量告诉 AI 要查找什么数据以及返回什么类型。

typescript
interface Variable {
  key: string              // Field name in extracted data
  type: 'string' | 'number' | 'date' | 'boolean' | 'object' | 'array'
  context: string          // Description/hint for the AI extractor
  required?: boolean       // Whether the field is required
  fields?: Variable[]      // Nested fields (for type: 'object')
  arrayType?: 'string' | 'number' | 'object'  // Item type (for type: 'array')
}

字段描述

字段类型描述
keystring在提取数据中作为键出现的字段名。至少 1 个字符。
typestring预期的数据类型。可选值:stringnumberdatebooleanobjectarray
contextstring帮助 AI 理解该字段要提取什么内容的自然语言描述。更具体的上下文能产生更好的结果。
requiredboolean(可选) 如果为 true,AI 将始终尝试为该字段返回值。
fieldsVariable[](可选) 对于 type: "object",定义对象内的嵌套字段。支持递归 -- 嵌套对象可以包含自己的 fields
arrayTypestring(可选) 对于 type: "array",指定数组中项的类型。可选值:stringnumberobject。当 arrayType"object" 时,使用 fields 定义每个项的结构。

递归结构

变量通过 fields 属性支持任意深度的嵌套。object 变量包含子变量,arrayType: "object"array 变量也使用 fields 来描述每个数组项。

这使您能够建模复杂的文档结构,如带有行项目的发票、带有多方当事人的合同,或任何层次化数据。

示例:带嵌套变量的发票模板

json
[
  {
    "key": "vendor_name",
    "type": "string",
    "context": "Name of the vendor or supplier company",
    "required": true
  },
  {
    "key": "invoice_number",
    "type": "string",
    "context": "Invoice or reference number"
  },
  {
    "key": "invoice_date",
    "type": "date",
    "context": "Date the invoice was issued (YYYY-MM-DD)"
  },
  {
    "key": "total_amount",
    "type": "number",
    "context": "Total amount due including tax"
  },
  {
    "key": "billing_address",
    "type": "object",
    "context": "The billing address on the invoice",
    "fields": [
      { "key": "street", "type": "string", "context": "Street address" },
      { "key": "city", "type": "string", "context": "City name" },
      { "key": "state", "type": "string", "context": "State or province" },
      { "key": "zip", "type": "string", "context": "Postal or ZIP code" },
      { "key": "country", "type": "string", "context": "Country name or code" }
    ]
  },
  {
    "key": "line_items",
    "type": "array",
    "context": "Individual line items listed on the invoice",
    "arrayType": "object",
    "fields": [
      { "key": "description", "type": "string", "context": "Item description" },
      { "key": "quantity", "type": "number", "context": "Quantity ordered" },
      { "key": "unit_price", "type": "number", "context": "Price per unit" },
      { "key": "total", "type": "number", "context": "Line total (quantity x unit_price)" }
    ]
  },
  {
    "key": "tags",
    "type": "array",
    "context": "Keywords or categories that describe this invoice",
    "arrayType": "string"
  }
]

在此示例中:

  • billing_address 是一个 object,包含五个嵌套的 string 字段。
  • line_items 是一个对象数组,每个对象包含 descriptionquantityunit_pricetotal
  • tags 是一个简单的字符串数组,没有嵌套 fields

UsageResponse

GET /v1/usage 端点返回。显示已认证用户的当前计划和计费周期内的提取用量。

typescript
interface UsageResponse {
  plan: 'free' | 'starter' | 'core' | 'pro'
  usage: number            // Extractions used this period
  limit: number            // Maximum for current plan
  periodKey: string        // Current period (e.g., "2025-01")
}

字段描述

字段类型描述
planstring用户当前的计划。可选值:freestartercorepro
usagenumber当前计费周期已消耗的提取次数。
limitnumber用户计划允许的最大提取次数。
periodKeystring当前计费周期,格式为 YYYY-MM(例如 "2025-01")。

示例

json
{
  "data": {
    "plan": "starter",
    "usage": 142,
    "limit": 500,
    "periodKey": "2025-01"
  }
}

TIP

用量端点返回的数据包含在 data 对象中,如上所示。

ErrorResponse

所有 API 错误遵循一致的结构,包含机器可读的 code 和人类可读的 message

typescript
interface ErrorResponse {
  error: {
    code: string           // Machine-readable error code
    message: string        // Human-readable description
  }
}

示例

json
{
  "error": {
    "code": "NOT_FOUND",
    "message": "Extraction not found"
  }
}

完整的错误码列表、HTTP 状态码和解决步骤,请参见错误码参考。

DocMap API 文档