Load and Adjust a File
Reference documentation on loading and manipulating files using Mindee client libraries.
Requirements
In most cases you'll be loading a source file for use in the Mindee Client, take a look at the Configure the Client section for more info.
However, you don't actually need the client initialized to use these features, only the client library installed.
Overview
Overall, the steps to sending a file are:
Load a source file.
Optional: adjust the source file before sending.
Use the Mindee client instance to send the file.
Load a Source File
You can load a source file from a path, from raw bytes, from a bytes stream, or from a language-specific object. Choose the appropriate type based on your application requirements.
If you're unsure of which to use, we recommend loading from a path.
To load a file, you'll need to import the corresponding input class from the mindee module.
To load a path string, use PathInput .
from mindee import PathInput
input_path = "/path/to/the/file.ext"
input_source = PathInput(input_path)To load a Path instance, use PathInput.
from pathlib import Path
from mindee import PathInput
input_path = Path("/path/to/the/file.ext")
input_source = PathInput(input_path)To load raw bytes, use BytesInput .
from pathlib import Path
from mindee import BytesInput
input_path = Path("/path/to/the/file.ext")
with input_path.open("rb") as fh:
input_bytes = fh.read()
input_source = BytesInput(
input_bytes,
filename="file.ext",
)To load a base-64 string, use Base64Input .
The string will be decoded into bytes internally.
from pathlib import Path
from mindee import Base64Input
input_base64 = "iVBORw0KGgoAAAANSUhEUgAAABgAAA ..."
input_source = Base64Input(
input_base64,
filename="base64_file.txt",
)To load a file handle, use FileInput.
It must be opened in binary mode, as a BinaryIO .
from pathlib import Path
from mindee import FileInput
input_path = Path("/path/to/the/file.ext")
with input_path.open("rb") as fh:
input_source = FileInput(fh)
# IMPORTANT:
# Continue all operations inside the 'with' statement.
mindee_client.enqueue_and_get_inference(
input_source, params
)To load a file, you'll need to import the corresponding input class and instantiate it.
Make sure to import the needed classes:
end using TypeScript.
const mindee = require("mindee");
// for TS or modules:
// import * as mindee from "mindee";To load a path string, use PathInput.
const filePath = "/path/to/the/file.ext";
const inputSource = new mindee.PathInput({ inputPath: filePath });To load a Buffer instance, use BufferInput .
const buffer = Buffer.from(
await fs.promises.readFile("/path/to/the/file.ext")
);
const inputSource = new mindee.BufferInput({
buffer: buffer,
filename: "file.ext",
});To load raw bytes, use BytesInput .
const inputBytes = await fs.promises.readFile("/path/to/the/file.ext");
const inputSource = new mindee.BytesInput({
inputBytes: inputBytes,
filename: "file.ext",
});To load a Stream, use StreamInput.
const stream = fs.createReadStream("/path/to/the/file.ext");
const input = new mindee.StreamInput({
inputStream: stream,
filename: "file.ext",
});To load a base-64 string, use Base64Input .
const b64String = "iVBORw0KGgoAAAANSUhEUgAAABgAAA ...";
const input = new mindee.Base64Input({
inputString: b64String,
filename: "base64_file.txt",
});To load a file, you'll need to import the corresponding input class from the Mindee\Input namespace.
To load a path string, use PathInput.
use Mindee\Input\PathInput;
$filePath = "/path/to/the/file.ext";
$inputSource = new PathInput($filePath);To load a file resource, use FileInput.
use Mindee\Input\FileInput;
$handle = fopen("/path/to/the/file.ext", "rb");
$inputSource = new FileInput($handle);To load raw bytes, use BytesInput.
use Mindee\Input\BytesInput;
$filePath = "/path/to/the/file.ext";
$handle = fopen($filePath, "rb");
$contents = fread($handle, filesize($filePath));
$inputSource = new BytesInput($contents, "file.ext");To load a base-64 string, use Base64Input .
The string will be decoded into bytes internally.
use Mindee\Input\Base64Input;
$inputBase64 = "iVBORw0KGgoAAAANSUhEUgAAABgAAA ..."
$inputSource = Base64Input($inputBase64, "base64_file.txt");To load a path string, use the PathInputSource class.
input_path = '/path/to/the/file.ext'
input_source = Mindee::Input::Source::PathInputSource.new(input_path)To load raw bytes, use the BytesInputSource class.
input_bytes = File.binread('/path/to/the/file.ext')
input_source = Mindee::Input::Source::BytesInputSource.new(input_bytes, file_name)To load a base-64 string, use Base64InputSource.
The string will be decoded into bytes internally.
input_base64 = 'iVBORw0KGgoAAAANSUhEUgAAABgAAA ...'
input_source = Mindee::Input::Source::Base64InputSource.new(
input_base64, 'file.ext'
)To load a file handle, use FileInputSource.
It must be opened in binary mode.
file = File.open('/path/to/the/file.ext', 'rb')
input_source = Mindee::Input::Source::FileInputSource.new(file, 'file.ext')To load a file, initialize it using the LocalInputSource class.
This class has different constructors to allow for opening various types of inputs.
To load a path string:
String filePath = "/path/to/the/file.ext";
LocalInputSource inputSource = new LocalInputSource(filePath);To load a Path instance:
Path filePath = new Path("/path/to/the/file.ext");
LocalInputSource inputSource = new LocalInputSource(filePath);To load a File instance:
File file = new File("/path/to/the/file.ext");
LocalInputSource inputSource = new LocalInputSource(file);To load a byte array:
byte[] fileBytes = Files.readAllBytes("/path/to/the/file.ext");
String filename = "file.ext";
LocalInputSource inputSource = new LocalInputSource(fileBytes, filename);To load an InputStream instance:
InputStream fileStream = new FileInputStream(
new File("/path/to/the/file.ext")
);
String filename = "file.ext";
LocalInputSource inputSource = new LocalInputSource(fileStream, filename);To load a base-64 string:
String inputBase64 = "iVBORw0KGgoAAAANSUhEUgAAABgAAA ...";
String filename = "file.ext";
LocalInputSource inputSource = new LocalInputSource(inputBase64, filename);To load a file, initialize it using the LocalInputSource class.
This class has different constructors to allow for opening various types of inputs.
To load a path string:
string filePath = "/path/to/the/file.ext";
var inputSource = new LocalInputSource(filePath)To load a FileInfo instance:
FileInfo fileinfo = new FileInfo("/path/to/the/file.ext");
var inputSource = new LocalInputSource(fileinfo)To load a byte array:
byte[] fileBytes = File.ReadAllBytes("/path/to/the/file.ext");
string filename = "file.ext";
var inputSource = new LocalInputSource(fileBytes, filename)To load a Stream instance:
Stream fileStream = FileStream SourceStream = File.Open(
"/path/to/the/file.ext", FileMode.Open);
string filename = "file.ext";
var inputSource = new LocalInputSource(fileStream, filename)Source File Metadata
Once a source file is loaded, various metadata can be accessed.
This can be useful for applying business rules based on the input file, for example:
Send PDFs to one model, images to another
Don't send PDFs with too many pages
Save the filename to a database
...
Here are some code samples, using an input source instance.
filename: str = input_source.filename
is_pdf: bool = input_source.is_pdf
number_of_pages: int = input_source.page_count// make sure to initialze the source first
await inputSource.init();
const filename = inputSource.filename;
const isPdf = inputSource.isPdf();
const numberOfPages = await inputSource.getPageCount();$filename = $inputSource->fileName;
$isPdf = $inputSource->isPdf();
$numberOfPages = $inputSource->getPageCount();filename = input_source.filename
is_pdf = input_source.pdf?
number_of_pages = input_source.page_countString filename = inputSource.getFilename();
boolean isPdf = inputSource.isPdf();
int numberOfPages = inputSource.getPageCount();string filename = inputSource.Filename;
bool isPdf = inputSource.IsPdf();
int numberOfPages = inputSource.GetPageCount();Adjust the Source File
Optionally make changes and adjustments to the source file before sending.
Fix PDF Headers
In some cases, PDFs will have corrupt or invalid headers. These files will return a 4xx HTTP error as the server will be unable to process them.
You can try to fix the headers using the provided functions.
Note: this feature is not yet available for all languages.
Here are some code samples, using an input source instance.
input_source.fix_pdf()$inputSource->fixPDF();input_source.fix_pdf!Compress Files
There is no need to send excessively large files to the Mindee API.
Unfortunately, many modern smartphones can take very high resolution images.
We provide a way to compress images before sending to the API.
Here are some code samples, using an input source instance.
Basic usage is very simple, and can be applied to both images and PDFs:
input_source.compress(quality=85)For images, you can also set a maximum height and/or width. The aspect ratio will always be preserved.
For example to compress and resize to no greater than 1920x1920 pixels:
input_source.compress(
quality=85, max_width=1920, max_height=1920
)Basic usage is very simple, and can be applied to both images and PDFs:
await inputSource.compress(85);For images, you can also set a maximum height and/or width. The aspect ratio will always be preserved.
For example to compress and resize to no greater than 1920x1920 pixels:
await inputSource.compress(85, 1920, 1920);Basic usage is very simple, and can be applied to both images and PDFs:
$inputSource->compress(quality: 85);For images, you can also set a maximum height and/or width. The aspect ratio will always be preserved.
For example to compress and resize to no greater than 1920x1920 pixels:
$inputSource->compress(
quality: 85, maxWidth: 1920, maxHeight: 1920
);Basic usage is very simple, and can be applied to both images and PDFs:
input_source.compress!(quality:85)For images, you can also set a maximum height and/or width. The aspect ratio will always be preserved. For example to compress and resize to no greater than 1920x1920 pixels:
input_source.compress!(
quality:85, max_width:1920, max_height:1920
)Basic usage is very simple, and can be applied to both images and PDFs:
inputSource.compress(85);For images, you can also set a maximum height and/or width. The aspect ratio will always be preserved.
For example to compress and resize to no greater than 1920x1920 pixels:
inputSource.compress(85, 1920, 1920);Basic usage is very simple, and can be applied to both images and PDFs:
inputSource.Compress(quality: 85);For images, you can also set a maximum height and/or width. The aspect ratio will always be preserved.
For example to compress and resize to no greater than 1920x1920 pixels:
inputSource.Compress(
quality: 85, maxWidth: 1920, maxHeight: 1920);Manipulate PDF Pages
In some cases, PDFs will have some superfluous pages present.
For example a cover page or terms and conditions which are not useful to the desired data extraction.
These extra pages count towards your billing and slow down processing.
It is therefore in your best interest to remove them before sending.
Parameters:
"Page Indexes" is required and is a list of 0-based page indexes. Use negative values to specify indexes starting from the end, i.e.
-1for the last page."Operation" specifies whether to keep only specified pages or remove specified pages. One of "Keep Only" or "Remove".
"On Min Pages" is optional and specifies the minimum number of pages a document must have for the operation to take place. The value of
0means any number of pages.
Exact naming of parameters will depend on the language.
Here are some code samples, using an input source instance.
from mindee import PageOptions
# Set the options as follows:
# For all documents, keep only the first page
page_options = PageOptions(
operation="KEEP_ONLY",
page_indexes=[0],
)
# Apply in-memory
input_source.apply_page_options(page_options)Some other examples:
# Only for documents having 3 or more pages:
# Keep only these pages: first, penultimate, last
PageOptions(
operation="KEEP_ONLY",
on_min_pages=3,
page_indexes=[0, -2, -1],
)
# For all documents:
# Remove the first page
PageOptions(
operation="REMOVE",
page_indexes=[0],
)
# Only for documents having 10 or more pages:
# Remove the first 5 pages
PageOptions(
operation="REMOVE",
on_min_pages=10,
page_indexes=list(range(5)),
)// Set the options as follows:
// For all documents, keep only the first page
const pageOptions: mindee.PageOptions = {
operation: mindee.PageOptionsOperation.KeepOnly,
pageIndexes: [0],
};
// Apply in-memory
await inputSource.applyPageOptions(pageOptions);Some other examples:
// Only for documents having 3 or more pages:
// Keep only these pages: first, penultimate, last
const pageOptions: mindee.PageOptions = {
operation: mindee.PageOptionsOperation.KeepOnly,
onMinPages: 3,
pageIndexes: [0, -2, -1],
};
// For all documents:
// Remove the first page
const pageOptions: mindee.PageOptions = {
operation: mindee.PageOptionsOperation.Remove,
pageIndexes: [0],
};
// Only for documents having 10 or more pages:
// Remove the first 5 pages
const pageOptions: mindee.PageOptions = {
operation: mindee.PageOptionsOperation.Remove,
onMinPages: 10,
pageIndexes: [0, 1, 2, 3, 4],
};use Mindee\Input\PageOptions;
use const Mindee\Input\KEEP_ONLY;
// Set the options as follows:
// For all documents, keep only the first page
$pageOptions = new PageOptions(
pageIndexes: [0],
operation: KEEP_ONLY
);
// Apply in-memory
$inputSource->applyPageOptions($pageOptions);Some other examples:
use Mindee\Input\PageOptions;
use const Mindee\Input\KEEP_ONLY;
use const Mindee\Input\REMOVE;
// Only for documents having 3 or more pages:
// Keep only these pages: first, penultimate, last
$pageOptions = new PageOptions(
pageIndexes: [0, -2, -1],
operation: KEEP_ONLY,
onMinPage: 3
);
// For all documents:
// Remove the first page
$pageOptions = new PageOptions(
pageIndexes: [0],
operation: REMOVE
);
// Only for documents having 10 or more pages:
// Remove the first 5 pages
$pageOptions = new PageOptions(
pageIndexes: [0, 1, 2, 3, 4],
operation: REMOVE,
onMinPage: 10
);# Set the options as follows:
# For all documents, keep only the first page
page_options = Mindee::PageOptions.new(
operation: :KEEP_ONLY,
page_indexes: [0],
)
# Apply in-memory
input_source.apply_page_options(page_options)Note: the name is apply_page_options instead of apply_page_options! even though the operation is in-place, this to harmonize with the other client libraries.
Some other examples:
# Only for documents having 3 or more pages:
# Keep only these pages: first, penultimate, last
Mindee::PageOptions.new(
operation: :KEEP_ONLY,
on_min_pages: 3,
page_indexes: [0, -2, -1],
)
# For all documents:
# Remove the first page
Mindee::PageOptions.new(
operation: :REMOVE,
page_indexes: [0],
)
# Only for documents having 10 or more pages:
# Remove the first 5 pages
Mindee::PageOptions.new(
operation: :REMOVE,
on_min_pages: 10,
page_indexes: array[0..4]
)import com.mindee.input.PageOptions;
import com.mindee.input.PageOptionsOperation;
// Set the options as follows:
// For all documents, keep only the first page
PageOptions pageOptions = new PageOptions.Builder()
.pageIndexes(new Integer[]{ 0 })
.operation(PageOptionsOperation.KEEP_ONLY)
.build();Some other examples:
import com.mindee.input.PageOptions;
import com.mindee.input.PageOptionsOperation;
// Only for documents having 3 or more pages:
// Keep only these pages: first, penultimate, last
PageOptions pageOptions = new PageOptions.Builder()
.pageIndexes(new Integer[]{ 0, -2, -1 })
.operation(PageOptionsOperation.KEEP_ONLY)
.onMinPages(3)
.build();
// For all documents:
// Remove the first page
PageOptions pageOptions = new PageOptions.Builder()
.pageIndexes(new Integer[]{ 0 })
.operation(PageOptionsOperation.REMOVE)
.build();
// Only for documents having 10 or more pages:
// Remove the first 5 pages
PageOptions pageOptions = new PageOptions.Builder()
.pageIndexes(new Integer[]{ 0, 1, 2, 3, 4 })
.operation(PageOptionsOperation.REMOVE)
.onMinPages(10)
.build();// Set the options as follows:
// For all documents, keep only the first page
var pageOptions = new PageOptions(
operation: PageOptionsOperation.KeepOnly
, pageIndexes: [ 0 ]);
// Apply in-memory
inputSource.ApplyPageOptions(pageOptions);Some other examples:
// Only for documents having 3 or more pages:
// Keep only these pages: first, penultimate, last
new PageOptions(
operation: PageOptionsOperation.KeepOnly
, onMinPages: 3
, pageIndexes: new short[] { 0, -2, -1 }
);
// For all documents:
// Remove the first page
new PageOptions(
operation: PageOptionsOperation.Remove
, pageIndexes: new short[] { 0 }
);
// Only for documents having 10 or more pages:
// Remove the first 5 pages
new PageOptions(
operation: PageOptionsOperation.Remove
, onMinPages: 10
, pageIndexes: new short[] { 0, 1, 2, 3, 4 }
);Send the File
Now that your file is ready, you'll want to send it to the Mindee servers for processing.
Head on over to the Send for Processing section for details on the next step.
Last updated
Was this helpful?

