Hi All,
We are currently using Document AI for form parsing some PDF document and half of times the default former processor either missing a col or messed up some col structure.
Let’s say the expected file header
Sales | Dollar Volume | Average Price
For example, I saw cases like
- Missing Header
Sales|Average Price
- Wrong structure
SalesDollar|Volume|Average Price
The content of first two cols are messed up as well. The cell could be missing value or incomplete value.
Any recommendation to improve this? If no easy way, any guidance with examples to train or deploy one’s own form processor? PS: the document has the same structure.