Document AI "Internal error encountered" on custom model training with some labels

Hello!

We’ve been trying Document AI for some time, previously we could train a custom model with no problem using around 10 documents and it worked pretty well.

But since friday we’ve been trying to train another custom model using around 50 labelled documents, but after 10 minutes or so the following error appears:

{
  "name": "projects/[REDACTED]/locations/[REDACTED]/operations/[REDACTED]",
  "done": true,
  "result": "error",
  "response": {},
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.documentai.v1.TrainProcessorVersionMetadata",
    "commonMetadata": {
      "state": "FAILED",
      "createTime": "2024-09-15T20:29:29.927297Z",
      "updateTime": "2024-09-15T20:38:02.040543Z",
      "resource": "projects/[REDACTED]/locations/[REDACTED]/processors/[REDACTED]/processorVersions/[REDACTED]"
    },
    "trainingDatasetValidation": {},
    "testDatasetValidation": {}
  },
  "error": {
    "code": 13,
    "message": "Internal error encountered.",
    "details": []
  }
}

This model also made use of the parent label feature, which is in preview, so we tried again without the use of that feature (making each child of the parent label its own label), but the error persists.

The only way we found to train successfully is to disable the labels that were on the parent label (before and after removing the use of the parent label), but with that solution we miss a lot of labels we need to continue.

We checked the logs of when the error happens and it shows this:

The replica workerpool0-0 exited with a non-zero status of 1. Termination reason: Error. To find out more about why your job exited please check the logs: [REDACTED]

The URL links to a premade query from Logs Explorer from an unknown project we don’t have access to, so we can’t see more details.

We also have other dataset of documents having the same issue with around 91 documents.

What could be causing this problem or how could we see more details about the error?

3 Likes

Hi @capaths ,

Welcome to Google Cloud Community!

The error message you encountered in Document AI indicates that there’s a problem with the training process itself.

Here are some steps and considerations that may help you troubleshoot and potentially resolve the problem:

  1. Check for Document Quality: Ensure that the documents you are using for training are properly formatted and labeled. Sometimes issues arise from inconsistencies in the data, such as missing labels, poor image quality, or unsupported file formats.
  2. Logs and Monitoring: Since the logs you’re directed to are from an unknown project, consider checking your own project’s logs for any related errors. If you have access to Cloud Logging, filter logs by the Document AI service and check for any warnings or errors around the time your training job fails.
  3. Review Dataset Size and Labels: Since you mentioned using the parent label feature in preview, make sure your dataset complies with any specific requirements or limitations associated with it. If this feature is causing problems, it might be best to avoid using it until it’s fully supported. Consider waiting for it to be officially released and more stable before incorporating it into your training process.
  4. Incremental Training: To find out if a specific document or label is causing the problem, try training your model with a smaller dataset. Start by removing documents or labels one by one and see if the error persists. This can help you identify any problematic elements in your data.
  5. API Quotas and Limits: Check if your project has reached any limits for using Document AI. If you’re using too many resources, it might cause internal errors.
  6. Documentation and Updates: Keep an eye on the official documentation and any release notes regarding Document AI features. If the parent label feature is in preview, there might be known issues or updates that address the error you’re encountering.

Additional tips:

  • For your reference, I came across an article/blog that addresses the same error message you encountered. It may provide additional insights and potential solutions.
  • If the issue persist you can contact Google Cloud Support to provide you with more specific insights.

I hope the above information is helpful.