Large Pdfs on Vertex Ai Generative Model

sd45 · September 6, 2024, 11:01am

Hi,

I’m trying to work with larger pdfs on Gemini 1.5 flash, and I cant seem to find a solution. One of the main practices suggested in the documentation is to split up the file into multiple pdfs. However, the problem with my code is that the model seems to only provide a summary for the final split pdf in my list, as opposed to a combination of all the pdf pieces.

‘’’

pdf_paths = [“splitPdfOne.pdf”, “splitPdfTwo.pdf”]

vertexai.init(project=os.getenv(“PROJECT_ID”), location=“us-central1”)
model = GenerativeModel(model_name=“gemini-1.5-flash-001”)
prompt = “Please provide a summary of the text in these documents.”

Create a list of Part objects from the PDF files

parts =
for pdf_path in pdf_paths:
parts.append(
Part.from_data(
data=open(pdf_path, “rb”).read(),
mime_type=“application/pdf”,
)
)
contents = parts + [prompt]
response = model.generate_content(contents)
print(response.text)
‘’’

I’d appreciate any help or advice for this issue, thank you!

cassandramae · September 11, 2024, 4:35pm

Hi @sd45 ,

You can check this sample GitHub repository about Document Processing with Gemini. You can use this code as a baseline for integration that can help you on how split PDFs into small chunks.

Hope this helps.

Topic		Replies	Views
Unexpected Data Loss When Processing Multiple PDFs with Gemini 1.5 Pro Custom ML & MLOps gemini-in-looker , vertex-ai-platform	1	62	March 12, 2025
Gemini API: PDF input token limit exceeded AI APIs cloud-natural-language-api	3	87	February 27, 2026
Vertex AI not understanding invoices? Custom ML & MLOps gemini-in-looker , vertex-ai-platform	2	25	May 17, 2024

Large Pdfs on Vertex Ai Generative Model

Create a list of Part objects from the PDF files

AI Suggested topics