Looking for a document scanning API that matches Google Drive native scan quality

Hi everyone,

I’m looking for a reliable API or managed service that can perform high-quality document scanning on photos of physical field forms — specifically automatic cropping, perspective correction, and image enhancement — similar to what Google Drive’s native mobile scanning produces.

Use case:
Field workers photograph paper forms on-site. These raw photos are then uploaded to a processing pipeline where I need them to be automatically transformed into clean, scannable document images before being stored in Google Drive.

What I’ve tried:
I tested both Google Vision API and Google Drive’s scanning capabilities via API, with AI assistance. While the APIs worked, the output quality — particularly in terms of perspective correction and contrast — didn’t match what Google Drive produces when scanning natively through its mobile app.

What I’m looking for:

  • An API or service that produces Drive-quality document scans from raw photos
  • Ideally a Google Cloud-native solution (e.g. Document AI, Vision API) but open to third-party options
  • The processed image should be returned as a clean JPEG or PDF, ready to store in Drive

Has anyone solved a similar problem? Any recommended APIs, Cloud Functions patterns, or third-party services (e.g. Microsoft Azure Document Intelligence, Adobe PDF Services) would be very helpful.

Thanks!

Try Mistral OCR

An apprentice of mind is a construction business owner, and they receive all sorts of invoices from suppliers, transporters, etc… so no two are alike.

  • And we’re not talking simple invoices… we’re talking things with hundreds of line items, smudged with dirt and grease, wrinkled, and “scanned” by taking a photo while holding it in your hand in the field with terrible lighting conditions.

So he was looking for something that was VERY robust, that could take any type of file (PDF (image or code), text files, images, scans, etc.) and no matter how simple or complicated, it needed to extract stuff out accurately and reliably.

Consistently, Mistral OCR was able to handle any and most weirdness

  • Optical problems
  • PDF file layout issues (inserts and comment boxes)

It wasn’t perfect, but it was the one that was accurate like 95% of the time; and the few times when it failed, if you looked at the document it makes sense why - clear problems with the document.


And they operate on a zero data retention policy by default, so no data is retained or used for training. :wink:


I have a doctor client, and we use this to process incoming files on our fax line. The faxes are automatically scanned and emailed (by the fax machine) to an email; we’ve got that email enrolled in a Google Cloud Pub/Sub, which then posts the info to a Cloud Run endpoint we’ve got, which (if the event matches our criteria) will go get the email and attachment, extract the contents with Mistral, send the contents to Ai to determine what patient it belongs to and what type of document it is, we then save the document into the google drive folder for that patient, and send the contents of the document off for further processing by ai (to extract the nugget of info we need, compare/contrast to what we already have, add/update/etc.) - creating a complete automatic document ingestion loop.

Hope it helps!

Thank you for the suggestion! Mistral OCR looks really powerful and we will definitely keep it in mind for future use cases.

However, what we are looking for is slightly different. Our goal is not to extract text from documents at this stage — we need the actual image of the document to look like a proper scan.

Here is our exact use case:

Field workers photograph paper forms on-site using AppSheet. These raw photos often contain unwanted backgrounds, angles, and surrounding objects. What we need is an automated pipeline that:

  1. Detects the document boundaries in the photo
  2. Applies perspective correction (removes skew/tilt)
  3. Crops out everything except the paper itself
  4. Enhances contrast and brightness
  5. Saves the result as a clean, scan-quality image to Google Drive

Essentially, we want to replicate what Google Drive’s native mobile scanner does — but triggered automatically after the user saves the form in AppSheet, without any manual steps.

The reason image quality matters here is that these are signed physical documents. At the end of the year, they are submitted digitally to government authorities, so they need to look clean, professional, and clearly legible — not like casual phone snapshots.