Integrating Mindee
Once you have your model set up, you'll want to start using it!
API key
Make sure you've created your API Key before continuing.
To create and manage your API keys, go to the "API Keys" section on the Mindee Platform.
Click on create API Key, choose a name for this API Key, and validate.
You're now ready to go!
Sending a File
To process your document using Mindee, simply send the file using the REST API.
Make a note of your model's ID for use in the API.
When getting started, we recommend using the Polling method which will be quickest.
Here are some code examples, these are self-contained and can be run as-is:
Requires Python 3.9 minimum and the requests library.
import json
import time
import requests
from pathlib import Path
def send_file_with_polling(
file_path: str,
model_id: str,
api_key: str,
max_retries: int = 30,
polling_interval: int = 2,
) -> dict:
file = Path(file_path)
headers = {"Authorization": api_key}
form_data = {"model_id": model_id, "rag": False}
with open(file_path, "rb") as fh:
files = {"file": (file.name, fh)}
print(f"Enqueuing file: {file_path}")
response = requests.post(
url="https://api-v2.mindee.net/v2/inferences/enqueue",
files=files,
data=form_data,
headers=headers,
)
response.raise_for_status()
job_data = response.json().get("job")
polling_url = job_data.get("polling_url")
# Important to wait before attempting to poll
time.sleep(3)
# Poll for completion
for attempt in range(max_retries):
print(f"Polling on: {polling_url}")
poll_response = requests.get(polling_url, headers=headers, allow_redirects=False)
poll_data = poll_response.json()
job_status = poll_data.get("job", {}).get("status")
if poll_response.status_code == 302 or job_status == "Processed":
result_url = poll_data.get("job", {}).get("result_url")
print(f"Get result from: {result_url}")
result_response = requests.get(result_url, headers=headers)
result_data = result_response.json()
return result_data
# still processing, wait before next poll
time.sleep(polling_interval)
# If we've exhausted all retries
raise TimeoutError(f"Polling timed out after {max_retries} attempts")
result = send_file_with_polling(
file_path="/path/to/file.pdf",
model_id="MY_MODEL_ID",
api_key="MY_API_KEY",
)
print(json.dumps(result,indent=2))
Processing the Results
Once you've sent the file and retrieved the results, you can start extracting the JSON payload.
The model's fields will be in the result.fields
object in the returned JSON.
Each key in the fields
object corresponds to the field's name
in your model configuration.
You'll want to adapt your processing depending on the type of field, for example looping over lists.
Accessing a simple value, where my_simple_field
is the name of the field in the Model.
my_simple_field = result["inference"]["fields"]["my_simple_field"]
field_value = my_simple_field["value"]
Accessing a list of values, where my_list_field
is the name of the field in the Model.
my_list_field = result["inference"]["fields"]["my_list_field"]
# access a value at a given position
field_first_value = my_list_field[0]["value"]
# loop over all values in the list
for list_item in my_list_field:
item_value = list_item["value"]
Accessing an object field, where my_object_field
is the name of the field in the Model.
In this hypothetical case, the object has a sub-field named sub_field
.
object_field = result["inference"]["fields"]["my_object_field"]
sub_field_value = object_field["sub_field"]["value"]
Accessing a list of objects, where my_object_list_field
is the name of the field in the Model.
object_list_field = result["inference"]["fields"]["my_object_list_field"]
# access an object at a given position
object_item_0 = object_list_field[0]
sub_field_0_value = object_item_0["sub_field"]["value"]
# loop over object lists
for object_item in object_list_field:
sub_field_value = object_item["sub_field"]["value"]
Last updated
Was this helpful?