Document pre processing

Hi, i am using Document AI custom processor to process large volumes of structured documents. However, the majority of the documents are phone images / scans, are there any document ai pre processing of images before the docs go through actual processors? A further question: how are the image quality metrics created within the OCR model

1 Like

Hello! Dealing with phone scans and raw camera images is definitely one of the biggest challenges in structured document extraction. To answer your questions:

1. Pre-processing in Document AI
Yes, under the hood, Document AI inherently performs automatic pre-processing steps at its foundational OCR layer before the data hits your Custom Processor’s extraction logic. This includes auto-rotation, deskewing, and basic noise reduction/binarization.

However, Document AI does not currently offer a configurable “pre-processing UI” to manually adjust contrast or fix severe page warping before extraction. For high-reliability pipelines, if the phone images are heavily distorted, the best practice is to place a lightweight pre-processing microservice (using OpenCV or ImageMagick for perspective correction and adaptive thresholding) before sending the payload to the Document AI API.

2. How Image Quality Metrics are Created
When you inspect the Document AI response payload, you’ll find imageQualityScores (which flag specific defects like BLURRED, GLARE, DARK, or FAINT_TEXT).

These metrics are generated by a separate, dedicated vision classifier (typically a lightweight CNN) that runs in parallel with the OCR pipeline. It does not measure the text itself, but rather evaluates the raw image tensor for visual artifacts. It assigns a confidence score to each defect category.

A Tip on Data Integrity:
From an architectural standpoint, we always recommend using these imageQualityScores as a gatekeeper. If an image scores too high on GLARE or BLURRED, it is much safer to route it to a Human-in-the-Loop (HITL) review queue. Forcing a Custom Processor to extract data from severely degraded pixels is a primary trigger for AI hallucinations and data corruption.

Hope this helps clarify the pipeline for your architecture!