Hi @KT-K,
The lack of a gray label suggests that your template’s checkbox definitions are likely not accurate. The model can’t properly align your annotations with the actual checkbox elements in the document.
Moreover, adding more samples without the gray label won’t help the model understand what a checkbox is. It will simply learn to treat those regions as empty or undefined. The key is to ensure your existing samples have correctly labeled checkboxes with the gray label. This teaches the model what a checkbox looks like.
Here are the factors Influencing checkbox detection:
1. Template Accuracy:
- Precise Bounding Boxes: Ensure the bounding boxes for your checkboxes are tight and accurate.
- Correct Annotation: Use the correct annotation type (e.g., “CHECKBOX”) in your template.
- Consistency: Maintain consistent checkbox definitions across all series.
2. Training Data Quality:
- Clarity and Consistency: Use clear, well-scanned documents with consistently marked checkboxes. Avoid blurry or smudged checkboxes.
- Variety: Include a diverse set of checkboxes with different sizes, shapes, and even partially filled-in checkboxes.
3. Document Structure:
- Spacing and Alignment: Ensure checkboxes are well-spaced and not too close to other text elements.
- Text Proximity: If checkboxes are too close to text, the model might struggle to distinguish them.
4. Model Training:
- Sufficient Data: Provide enough training data to cover the various checkbox styles and patterns in your surveys.
- Training Duration: Train the model for a sufficient amount of time to allow it to learn effectively.
By focusing on template accuracy, data quality, and proper labeling, you’ll significantly improve your checkbox detection within your Document AI model.
I hope this clarifies your concern.