Document Processing Implementation

Verify income and employment from documents your users already have — pay stubs, W-2s, 1099s, and tax returns — by sending them to the Truv API. Truv validates each file, classifies it with AI, extracts the data, and returns the same income and employment report schema as a live payroll connection. Use Document Processing as a fallback when a borrower can’t connect their payroll provider, or as a standalone way to turn a pile of uploaded files into structured, verified data. Everything on this page is server-side and language-agnostic — you upload Base64-encoded files over JSON and poll or listen for results.

This guide covers the server-side API, where your backend uploads files and finalizes the collection. It’s a separate integration path from Truv Bridge — the client-side widget where users connect a payroll or bank account directly.

Run the Document Processing demo first to watch the full upload → validate → classify → finalize → report flow end to end, then come back here to build it.

Intelligent intake

Document Processing handles whatever your users actually have. Instead of forcing them to label each file or upload one document at a time, you collect everything they send and Truv sorts out what counts.

Upload anything, get feedback up front. Users submit any mix of supported files — including phone photos. Truv checks each file on upload and reports problems through the validations flags and file status you poll in Step 4: an unreadable file returns is_readable: false, an oversized one returns is_viable_size: false, and its status becomes invalid. Surface that feedback so a user can retake a blurry photo before they finish, instead of failing verification hours later.
Truv recognizes what each file is. Every uploaded file is classified into recognized documents, each with a document_type. Files that aren’t a verification document are classified as OTHER and left out of finalization. The user doesn’t label anything.
Proceed as soon as the minimum is met. Because every document is classified, your app can check the recognized documents against your requirement, finalize only the documents that matter, and let the user continue once it’s satisfied.

Example — SNAP income verification. A requirement of two recent pay stubs is met even if the applicant uploads five files. Truv recognizes the two pay stubs among them; you finalize those two document_ids and let the applicant proceed, ignoring the rest.

How document processing works

Document Processing is a collection-based workflow. You group a user’s files into a collection, Truv recognizes what each file is, and you finalize the documents you want into a report.

Object	What it is
Collection	The container for one or more users’ uploaded files. Created once, then reused for every call.
Uploaded file	A raw file you sent (`is_valid`, `is_readable`, size, type checks). One file can contain several documents.
Recognized document	An AI-classified document found inside a file — a `PAYSTUB`, a `W2`, a page range, and the user it belongs to.
Link	The result of finalizing documents. Processed asynchronously into the standard income/employment report.

The lifecycle is always the same: create a collection → monitor validation and classification → finalize → retrieve the report. The steps below walk through each call.

Step 1: Identify or create a user [Server-side]

Retrieve an existing user_id or create a new user.

If AIM Check will be used, ensure at least the last 4 digits of the SSN are included during user creation.

Find an existing user:

curl -X GET https://prod.truv.com/v1/users/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json"

Or create a new user:

curl -X POST https://prod.truv.com/v1/users/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "external_user_id": "your-internal-id",
    "first_name": "John",
    "last_name": "Doe",
    "ssn": "1234"
  }'

Save the returned id. You’ll need it in subsequent steps.

Step 2: Create a document collection [Server-side]

Initialize the collection and upload documents in a single call. Each file is Base64-encoded and queued for validation and AI classification automatically.

curl -X POST https://prod.truv.com/v1/documents/collections/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "mime_type": "application/pdf",
        "filename": "most-recent-paystub.pdf",
        "content": "{{base64_encoded_file}}",
        "user_id": "{{user_id}}"
      },
      {
        "mime_type": "image/jpeg",
        "filename": "prior-paystub-photo.jpg",
        "content": "{{base64_encoded_file}}",
        "user_id": "{{user_id}}"
      }
    ]
  }'

The response includes a collection_id. Save this for all subsequent calls.

Accepted formats and limits

Users rarely have a clean PDF. Truv accepts photos taken on a phone alongside scanned documents:

Constraint	Value
File formats (`mime_type`)	`application/pdf`, `image/jpeg`, `image/png`, `image/tiff`, `image/webp`, `image/x-ms-bmp`, `image/heic`, `image/heif`
File size	1 KB – 10 MB per file
Files per request	Up to 10

A single file can hold more than one document — a multi-page PDF of two pay stubs and a W-2 is split into three recognized documents automatically.

Assign documents to users

Each document can be tied to a user with either user_id (Truv ID) or external_user_id (your system’s ID) — never both on the same document. To create or update users in the same call, pass a users array; Truv can also match documents to those users by name.

curl -X POST https://prod.truv.com/v1/documents/collections/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "users": [
      { "external_user_id": "case-12345", "full_name": "John Doe", "ssn": "1234" }
    ],
    "documents": [
      {
        "mime_type": "application/pdf",
        "filename": "household-docs.pdf",
        "content": "{{base64_encoded_file}}",
        "external_user_id": "case-12345"
      }
    ]
  }'

Step 3: Add files incrementally (Optional) [Server-side]

For save-and-resume workflows, add up to 10 more files to an existing collection:

curl -X POST https://prod.truv.com/v1/documents/collections/{collection_id}/upload/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "mime_type": "application/pdf",
        "filename": "w2-2023.pdf",
        "content": "{{base64_encoded_file}}",
        "user_id": "{{user_id}}"
      }
    ]
  }'

Step 4: Monitor validation and classification [Server-side]

Document processing is asynchronous. Poll the collection until every uploaded file reaches a terminal status.

curl -X GET https://prod.truv.com/v1/documents/collections/{collection_id}/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json"

The response has two sections. uploaded_files — file-level validation. Each file carries a validations object:

Field	Meaning
`is_valid`	Overall validation status
`is_unique`	Not a duplicate of another file in the collection
`is_readable`	File can be opened and parsed
`is_accessible`	File is not password-protected
`is_viable_size`	File is within the 1 KB – 10 MB range
`is_supported_type`	MIME type is on the accepted list

A file’s status moves through these values:

Status	Meaning
`pending` / `validating` / `processing`	Still in flight — keep polling
`validated`	Passed validation, classification in progress
`successful`	Validated and classified — ready to finalize
`invalid`	Failed one or more validation checks (see `validations`)
`duplicate`	Matches a file already in the collection
`failed`	Could not be processed

documents — AI-recognized records, each with a document_type (and document_subtype where relevant), the borrower’s name, page range, and the user it’s matched to. A recognized document’s status is successful, failed, or rejected. Example response:

{
  "collection_id": "f1f63754e72a44b4805a4d16380cd833",
  "uploaded_files": [
    {
      "file_id": "9efadd7fb43c4598b02fe79bd7e31fd7",
      "filename": "most-recent-paystub.pdf",
      "mime_type": "application/pdf",
      "validations": {
        "is_valid": true,
        "is_unique": true,
        "is_readable": true,
        "is_accessible": true,
        "is_viable_size": true,
        "is_supported_type": true
      },
      "status": "successful",
      "user_id": "c87e695081704310b15d59ee1640b1aa"
    }
  ],
  "documents": [
    {
      "document_id": "b1091a2ddc5d428c9c8af66dbc8b0556",
      "file_id": "9efadd7fb43c4598b02fe79bd7e31fd7",
      "document_type": "PAYSTUB",
      "document_subtype": null,
      "status": "successful",
      "first_name": "John",
      "last_name": "Doe",
      "start_page": 1,
      "end_page": 1
    }
  ]
}

How duplicate detection works

Truv flags duplicates by file content, not file name. A file is marked is_unique: false with status: "duplicate" when its content matches another file already in the same collection.

Identical content, different file names — flagged as duplicates.
Same file name, different content — both treated as unique.
Detection is scoped to a single collection: Truv compares each file against the others in that collection only.

Truv keeps one file as the original (is_unique: true) and flags every additional copy as a duplicate. A duplicate is not an error — the file is returned in the normal response, and you choose which documents to process by passing their document_ids to finalize.

This is production behavior. In the sandbox, duplicate detection is file-name-driven — the test scenario is keyed off the filename and file content is not parsed. Sandbox results for renamed or identically-named files won’t match production’s content-based matching, so validate duplicate handling against production data.

Step 5: Finalize the collection [Server-side]

Once documents are validated, finalize to trigger report generation. Finalize the whole collection, or pass document_ids to select only the documents you need — this is how you proceed with the valid documents and ignore everything else a user uploaded. Use all recognized documents:

curl -X POST https://prod.truv.com/v1/documents/collections/{collection_id}/finalize/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "product_type": "income"
  }'

Or select specific documents:

curl -X POST https://prod.truv.com/v1/documents/collections/{collection_id}/finalize/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "document_ids": [
      "b1091a2ddc5d428c9c8af66dbc8b0556",
      "1cbb5f9bb6f345059af491917179c80b"
    ],
    "product_type": "income"
  }'

product_type is income (default) or employment. The response groups results by user and returns a link_id for each:

{
  "users": [
    {
      "id": "21ae68826be042e5ab410e531dc40889",
      "links": [
        {
          "link_id": "1995e2fc5e374beb955ae81276294815",
          "status": "new",
          "documents": [
            { "id": "ad568bb5c117439e9c00df738052c653", "document_type": "PAYSTUB", "document_subtype": null },
            { "id": "769014511ed748d0a650a73759d06127", "document_type": "PAYSTUB", "document_subtype": null }
          ]
        }
      ]
    }
  ]
}

Save the link_id.

Step 6: Retrieve verified data [Server-side]

Truv processes the Link asynchronously. Wait for the Task to complete before fetching the report. Recommended: Listen for a webhook Wait for a task-status-updated webhook with status: done and the matching link_id, then fetch the report. Alternative: Poll finalization status

curl -X GET https://prod.truv.com/v1/documents/collections/{collection_id}/finalize/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json"

A Link status of done means the report is ready. no_data, config_error, error, and unavailable are the other terminal states. Fetch the report:

Income & Employment
Employment Only

curl -X GET https://prod.truv.com/v1/links/{link_id}/income/report/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json"

curl -X GET https://prod.truv.com/v1/links/{link_id}/employment/report/ \
  -H "X-Access-Client-Id: YOUR_TRUV_CLIENT_ID" \
  -H "X-Access-Secret: YOUR_TRUV_CLIENT_SECRET" \
  -H "Content-Type: application/json"

The report schema is identical to a live payroll connection.

Supported document types

AI classification returns a document_type for each recognized document. Income and employment documents drive verification reports; the remaining types are recognized for routing and identity workflows.

`document_type`	Description
`PAYSTUB`	Pay statements / pay stubs
`W2`	W-2 wage and tax statements
`F1099`	1099 forms (see subtypes below)
`F1040`	Individual income tax return
`BANK_STATEMENT`	Bank statements
`DRIVER_LICENSE`	Driver’s licenses
`PASSPORT`	Passports
`GREEN_CARD`	Permanent resident cards
`UTILITY_BILL`	Utility bills
`LEASE_AGREEMENT`	Lease agreements
`INSURANCE_HOME_POLICY`	Home insurance policies
`INSURANCE_AUTO_POLICY`	Auto insurance policies
`LETTER_OF_VERIFICATION`	Letters of verification
`VOLUNTEER_LETTER`	Volunteer / community service letters
`OTHER`	Recognized file that isn’t a verification document

Document subtypes

When Truv recognizes a more specific variant, the recognized document also carries a document_subtype:

`document_subtype`	Description
`F1099_MISC`	1099-MISC — miscellaneous income
`F1099_NEC`	1099-NEC — nonemployee compensation
`F1099_DIV`	1099-DIV — dividends and distributions
`F1099_INT`	1099-INT — interest income
`F1099_G`	1099-G — government payments
`F1099_R`	1099-R — retirement distributions
`F_SSA1099`	SSA-1099 — Social Security benefit statement
`VOL_TRANSCRIPT`	Volunteer transcript
`VOL_HOURS_LOG`	Volunteer hours log

For sandbox testing, use the test documents — the sandbox uses file names to determine results rather than actual file contents.

Detect suspicious documents

Truv runs fraud detection on every document and may set is_suspicious: true on the report while still returning results, or fail the document outright for high-confidence fraud. See Fraud and Manual Review for how to handle flagged and failed documents.

API Reference

Create Collection

Initialize a document collection

Upload Files

Add files to an existing collection

Get Collection

Check validation and AI classification status

Finalize Collection

Trigger report generation

Next steps

Test Documents

Sample PDFs for sandbox testing

Income Data Guide

Interpret income and employment data from reports

Webhooks

Set up notifications for processing completion

Document Processing Demo

Upload pay stubs, W-2s, and tax returns. See structured data extraction in action.

​Intelligent intake

​How document processing works

​Step 1: Identify or create a user [Server-side]

​Step 2: Create a document collection [Server-side]

​Accepted formats and limits

​Assign documents to users

​Step 3: Add files incrementally (Optional) [Server-side]

​Step 4: Monitor validation and classification [Server-side]

​How duplicate detection works

​Step 5: Finalize the collection [Server-side]

​Step 6: Retrieve verified data [Server-side]

​Supported document types

​Document subtypes

​Detect suspicious documents

​API Reference

Create Collection

Upload Files

Get Collection

Finalize Collection

​Next steps

Test Documents

Income Data Guide

Webhooks

Document Processing Demo

Intelligent intake

How document processing works

Step 1: Identify or create a user [Server-side]

Step 2: Create a document collection [Server-side]

Accepted formats and limits

Assign documents to users

Step 3: Add files incrementally (Optional) [Server-side]

Step 4: Monitor validation and classification [Server-side]

How duplicate detection works

Step 5: Finalize the collection [Server-side]

Step 6: Retrieve verified data [Server-side]

Supported document types

Document subtypes

Detect suspicious documents

API Reference

Next steps