Large Pdfs on Vertex Ai Generative Model

Hi,

I’m trying to work with larger pdfs on Gemini 1.5 flash, and I cant seem to find a solution. One of the main practices suggested in the documentation is to split up the file into multiple pdfs. However, the problem with my code is that the model seems to only provide a summary for the final split pdf in my list, as opposed to a combination of all the pdf pieces.

‘’’

pdf_paths = [“splitPdfOne.pdf”, “splitPdfTwo.pdf”]

vertexai.init(project=os.getenv(“PROJECT_ID”), location=“us-central1”)
model = GenerativeModel(model_name=“gemini-1.5-flash-001”)
prompt = “Please provide a summary of the text in these documents.”

Create a list of Part objects from the PDF files

parts =
for pdf_path in pdf_paths:
parts.append(
Part.from_data(
data=open(pdf_path, “rb”).read(),
mime_type=“application/pdf”,
)
)
contents = parts + [prompt]
response = model.generate_content(contents)
print(response.text)
‘’’

I’d appreciate any help or advice for this issue, thank you!

Hi @sd45 ,

You can check this sample GitHub repository about Document Processing with Gemini. You can use this code as a baseline for integration that can help you on how split PDFs into small chunks.

Hope this helps.