Green decoration

Classify & Extract Documents

Productized intelligent document processing for Swiss back office: sort incoming PDFs, scans, and emails into the right buckets, then extract the fields your team needs — with confidence scores, human review where it matters, and full audit trail.

Intelligent Document Processing Suite

Production IDP, not a model demo

We have built an end-to-end Intelligent Document Processing (IDP) pipeline that fits into your technology stack. Classify any document, extract information from it and surface it in a review environment for your approvers. The output of our IDP pipeline is high quality data that your ERP, PIM etc. already trust.

Multi-format document ingestion

Inbound documents come in all shapes and sizes. PDFs, scanned images, .xlsx and .doc files, supplier datasheets and email bodies all go down the pipeline. For documents with images — scanned invoices, handwritten Belege — Mistral OCR runs a two-pass process: raw text first, then structured. Multi-column German invoices parse cleanly.

AI classification by your taxonomy

Rather than rule sets per format, classification is driven by an admin-defined Category model. For each Category the admin defines Document Types, Field Blocks, and the Downstream Roles tasked with review. Adding another family — Leistungsabrechnung, Lieferschein — is one admin change. Document classification AI follows the schema.

Structured extraction in JSON mode

After classification, a document is routed to the extraction schema for its category. The schema defines field blocks, types, and required vs optional flags. The model is forced into OpenAI JSON mode, so document data extraction outputs a typed JSON payload — ready for downstream systems without additional parsing.

Role-gated human-in-the-loop review

Anything the pipeline cannot extract cleanly — partial tables, misclassified documents, low-confidence fields — routes to a HITL review queue for the right role. Approved data flows downstream; rejections feed the prompt registry as training signal. AI document extraction is verified, not assumed.

Swiss-multilingual: EN, DE, FR, IT

All categories, field blocks and review surfaces carry EN and DE locales out of the box. French and Italian extend via the prompt registry without code changes. A Romandie supplier sheet, a Zurich Versicherung claim and a Ticino contract all run through one end-to-end intelligent document processing pipeline.

Swiss data residency on request

For FINMA, MDR or IVDR-sensitive content, deploy on Swiss-resident hosting or on customer premises. Apertus offers a sovereign LLM track — document content never leaves Swiss or EU jurisdiction. All classification and extraction decisions are logged at audit-grade level, surfaced transparently to the customer.

How we deliver it

We start with your documents — sample sets per family, the downstream system that will consume the data, and the roles tasked with review. AI document automation begins with concrete schemas, not abstractions.

The pipeline runs end-to-end for one family: ingestion, classification, extraction, HITL review, downstream update. We used this intelligent document automation to process hundreds of thousands of SKUs at Weita and Sanitas.

Customize the review experience to fit the roles of individuals within your organization — who can see and edit which fields, what triggers an escalation, and where each handoff lands.

Once the first family is stable, we extend the rest of the taxonomy — new categories, field blocks, downstream systems. Because document classification follows the admin-defined schema, every new family is a configuration sprint.

We hand over the prompt-registry for them to update prompts and also hand over role guides and a thin support engagement for them to monitor for accuracy-drift. Sometimes customers hand it back to their in-house team.

Why this engine, not generic IDP

Schema you control, not a vendor template

Most IDP products ship with their own document taxonomy and force the customer to fit it. We flip that: your Category and FieldBlock model is the spec, and the AI binds to it. When operations adds a new document family in month six, you reconfigure it — no vendor ticket.

Built on Swiss delivery, not retrofitted

Weita, Sanitas Troesch, and the insurance AI POC all run the same engine in Swiss back-office operations. Bilingual EN/DE is baked in. Mistral OCR runs the two-pass classifier. Reverb provides the editing surface for AI document extraction. Same architecture, three verticals.

HITL where it matters, automation where it doesn't

We don't pretend the model is 100% right. Low-confidence extractions and unmapped categories surface in the role-gated review queue; approvals flow back into the prompt registry. High-confidence routine documents skip the queue entirely. AI document automation is verified, not assumed.

One engine, three verticals already proven

The same architecture covers Swiss wholesale PIM at Weita, construction-supply onboarding at Sanitas, and insurance claims at the POC. New verticals inherit the proven pattern — they extend the engine rather than rebuild it.

Audit-grade by construction

Every step is logged — category, prompt version, model ID, approver. The audit trail ships standard, not retrofitted. For FINMA, MDR and IVDR-sensitive customers, GDPR posture is part of the deployment template, including document classification provenance.

Frequently Asked Questions

  • OCR turns document images into characters. Intelligent document automation goes further — it classifies the document (invoice, order, delivery note), then extracts structured fields. Errors below threshold route to a reviewer; corrected data flows back into the pipeline.

About SAPIENTROQdecoration

ai avatar

Hey there! I’m your AI assistant developed by SAPIENTROQ. I am a language model connected to a RAG database that contains information about the company. If you need insights on AI solutions, real use cases, or how AI can boost your business, please feel free to ask in any language you prefer.

Choose an option

Hey! I am AI agent developed by SAPIENTROQ 🤖
Decoration
Decoration

Interested in a solution?

We are glad to show you various options without any obligation.

Roland Kurmann

Roland Kurmann

CEO, SAPIENTROQ

Book a call

Decoration