Automatically Splitting Multi-page Invoices Using the Mindee Client Libraries

circle-info

The Node.js library implementation differs from our other supported languages, see the Node.js dedicated tutorial instead.

Overview

circle-check

The Invoice Splitter Auto-Extraction feature allows you to process multi-page invoice files, automatically split them into individual invoices, and extract data from each one. This guide demonstrates how to use the Mindee library to accomplish this task across various programming languages.

When to Use This Feature

Use this feature when you have:

  • A single file containing multiple invoices

  • Invoices from the same or different providers in one document

  • The need to process each invoice individually without manual separation

circle-info

Note: This API is distinct from the Multi Receipts Detector API, which isolates receipts within individual pages.

Prerequisites

Before you begin, ensure you have:

Sample File

For this tutorial, we'll use the following sample multi-page invoice file:

[block:embed] { "url": "https://github.com/mindee/client-lib-test-data/blob/249c253aa93cd9dd10e235dc5ec0cacb2536f7ed/products/invoice_splitter/default_sample.pdf", "favicon": "https://github.com/favicon.ico", "provider": "github.com", "href": "https://github.com/mindee/client-lib-test-data/blob/249c253aa93cd9dd10e235dc5ec0cacb2536f7ed/products/invoice_splitter/default_sample.pdf", "typeOfEmbed": "default" } [/block]

Basic Setup

  1. Import the necessary classes from the Mindee library.

  2. Initialize the Mindee client with your API key.

  3. Create an input source from your file path.

Processing the Input

Check File Format and Page Count

  1. Check if the file is a PDF.

  2. If it's a PDF, check if it has multiple pages.

Process Multi-Page Documents

  1. Use the Invoice Splitter API to get page groups.

  2. Extract individual invoices using the page groups.

  3. Process each extracted invoice with the Invoice OCR API.

Process Single-Page Documents

For single-page documents or non-PDFs, process the document directly with the Invoice OCR API.

Example Output

After processing, you'll receive detailed information about each invoice. Here's a sample output:

Full Script

Best Practices

  • Handle potential errors and exceptions in your code.

  • Implement retry logic for API calls to handle temporary network issues.

  • Store extracted data securely and in compliance with relevant data protection regulations.

  • When uploading files, ensure that:

    • Invoices are clear, unstained, and properly unfolded

    • There are minimal extra pages (e.g., terms & conditions)

    • Pages from any single invoice are all oriented in the same direction

Troubleshooting

If you encounter issues:

  1. Verify your API key and subscription status for both Invoice Splitter and Invoice OCR APIs.

  2. Check the input file format and ensure it's supported.

  3. Review the API response for any error messages.

  4. Consult the Mindee API documentationarrow-up-right for more detailed information.

Last updated

Was this helpful?