Automatically Splitting Multi-page Invoices Using the Mindee Client Libraries
The Node.js library implementation differs from our other supported languages, see the Node.js dedicated tutorial instead.
Overview
Just want to get to the full script? Jump to the relevant section.
The Invoice Splitter Auto-Extraction feature allows you to process multi-page invoice files, automatically split them into individual invoices, and extract data from each one. This guide demonstrates how to use the Mindee library to accomplish this task across various programming languages.
When to Use This Feature
Use this feature when you have:
A single file containing multiple invoices
Invoices from the same or different providers in one document
The need to process each invoice individually without manual separation
Note: This API is distinct from the Multi Receipts Detector API, which isolates receipts within individual pages.
Prerequisites
Before you begin, ensure you have:
A valid and up-to-date Mindee API key
An active subscription to the Invoice Splitter API
An active subscription to the Invoice OCR API
The Mindee client library installed for your programming language
Sample File
For this tutorial, we'll use the following sample multi-page invoice file:
[block:embed] { "url": "https://github.com/mindee/client-lib-test-data/blob/249c253aa93cd9dd10e235dc5ec0cacb2536f7ed/products/invoice_splitter/default_sample.pdf", "favicon": "https://github.com/favicon.ico", "provider": "github.com", "href": "https://github.com/mindee/client-lib-test-data/blob/249c253aa93cd9dd10e235dc5ec0cacb2536f7ed/products/invoice_splitter/default_sample.pdf", "typeOfEmbed": "default" } [/block]
Basic Setup
Import the necessary classes from the Mindee library.
Initialize the Mindee client with your API key.
Create an input source from your file path.
Processing the Input
Check File Format and Page Count
Check if the file is a PDF.
If it's a PDF, check if it has multiple pages.
Process Multi-Page Documents
Use the Invoice Splitter API to get page groups.
Extract individual invoices using the page groups.
Process each extracted invoice with the Invoice OCR API.
Process Single-Page Documents
For single-page documents or non-PDFs, process the document directly with the Invoice OCR API.
Example Output
After processing, you'll receive detailed information about each invoice. Here's a sample output:
Full Script
Best Practices
Handle potential errors and exceptions in your code.
Implement retry logic for API calls to handle temporary network issues.
Store extracted data securely and in compliance with relevant data protection regulations.
When uploading files, ensure that:
Invoices are clear, unstained, and properly unfolded
There are minimal extra pages (e.g., terms & conditions)
Pages from any single invoice are all oriented in the same direction
Troubleshooting
If you encounter issues:
Verify your API key and subscription status for both Invoice Splitter and Invoice OCR APIs.
Check the input file format and ensure it's supported.
Review the API response for any error messages.
Consult the Mindee API documentation for more detailed information.
Last updated
Was this helpful?

