Issue with Layout Parser: Individual items not detected as separate blocks

I am testing the Layout Parser (v1beta3) with the latest processor version (updated Jan 2026).

Goal: To extract individual illustrations (e.g., chairs in a catalog) along with their specific coordinates.

Current Problem: Although the processor returns a boundingPoly with 4 coordinates, it only covers the entire page or a very large section. The individual elements inside are merged into one large Block.

  • Processor: Layout Parser (v1beta3)

  • Attempted: Toggle enableLlmLayoutParsing (True/False)

  • Result: blocks_count is very low (e.g., 2 blocks for a page with 10+ items).

Is this a known limitation of the current preview version, or is there a specific configuration to improve the segmentation of visual elements? Any advice on how to get individual bounding boxes for small elements would be appreciated.

1 Like

Hi @k.k This is expected behavior. Layout Parser is designed for document structure like paragraphs and tables, not for detecting small visual objects such as individual chairs. That is why it merges many elements into one large block.

The enableLlmLayoutParsing setting will not improve fine object segmentation.

If you need separate bounding boxes for each illustration, you should use an object detection model such as Vertex AI Vision or AutoML Vision. Layout Parser is not intended for that level of visual detection.

1 Like

@a_aleinikov

Thank you for your reply. This is extremely helpful as I was stuck on this issue.

Regarding the layout parser, is it possible to extract the coordinate information for the text as well?

I also tried using Vertex AI Studio (on the console), and while I was able to get the BBOX information for the images, the accuracy was unfortunately quite low.

I will go ahead and try “Vertex AI Vision” and “AutoML Vision” as you suggested. Thank you again!

1 Like