Document AI (Custom Document Extractor) - Low F1 Score and poor results using the Model-Based Model

We encountered some problems when training a custom document extractor using the Model-Based Training Model on Google Cloud. We set out some basic facts of our problems and would like to see if anyone could help:

  • we used the custom document extractor to extract some information from a form and provided about 50 samples in total for training and testing using the Model-Based Training Model

  • However, after the training, we note that some of the fields have received F1 Score of 0.000 which means that Google has failed to detect and to predict anything using the Google extractor. This was a surprise to us as the same fields would receive a higher F1 score using the Foundation model (i.e. the default model provided by Google in the beginning)

  • We noticed that the auto-labelling function on Google AI was abled to pick up the information in those fields during the process of auto-labelling before we trained the Model-Based Training Model. As such, it was a surprise to us when the result is worse after it has been trained

In this regard, we would like to know how could we improve the F1 Score (specifically for some of the fields, why would the F1 Score be lower or turned into 0.000 after the Model-Based Training Model), or any other solutions you guys would advise to increase the F1 score or accuracy when extracting the data.

Thank you!

1 Like