Batch reduce high latency in bucket gcs

dmontielg · September 29, 2023, 9:41am

Hi,

I have a docker image that I am able to run using batch and it uses the resources that are needed from a bucket using volume gcs such as in the example here https://cloud.google.com/batch/docs/create-run-job-storage#use-bucket

The issue I have is the very high latency to read and generate the intermediate files in the bucket, it literally takes more than 10 hours, where locally takes ~2 hours to produce these files using the same machine-type. I see that an option is to use persistent disks to reduce latency but I am not aware on how to connect/bind this new pd-disk and also be able to use the resources from the bucket. My intuition is to maybe copy the resources needed from the bucket to the pd-disk, then generate the intermediate files there, and finally copy the output to a bucket?

Thanks in advanced any help!

Diego

robertcarlos · September 29, 2023, 9:30pm

Hi @dmontielg ,

Welcome to Google Cloud Community!

Based on the documentation that you provided, using Cloud Storage bucket is automatically mounted to your VM using Cloud Storage FUSE. One of its disadvantages is as follows:

Performance: Cloud Storage FUSE has much higher latency than a local file system, and as such, should not be used as the backend for storing a database. Throughput may be reduced when reading or writing one small file at a time. Using larger files and/or transferring multiple files at a time will help increase throughput.

If you prefer a persistent disk, you need to add it first to your VM. You may also check the restrictions for all persistent disks.

There are also other options like local SSD and network file storage that you may use for storage volumes.

Hope these help.

Topic		Replies	Views
How to reduce overhead of running container jobs in Batch? Compute Infrastructure compute-engine , cloud-storage , batch	2	19	November 7, 2023
Will Google Cloud Batch persist the container file system between jobs? Compute Infrastructure batch	2	33	April 23, 2024
Cloud Batch disks options Compute Infrastructure batch	2	37	October 23, 2024

Batch reduce high latency in bucket gcs

AI Suggested topics