I wish the official GCP documentation had a comparison table for Cloud Run service vs Cloud Run Jobs. I found this table via Google search but it is not part of the official GCP documentation. Anyway coming back to my question, does cloud run service NOT offer parallelism at all? I see a concurrency parameter as part of the configuration. I have 2 questions
- Does Cloud Run jobs support attaching GPUs?
- Is it a good idea to run ML Inference jobs using Cloud Run Jobs?
Hi @dheerajpanyam,
Cloud Run services offer parallelism through the concurrency setting, which is a bit different from the parallelism in jobs.
In Cloud Run services, the concurrency setting dictates how many requests a single container can handle at once. If your service gets too many requests, Cloud Run automatically adds more instances.
Cloud Run jobs, however, use parallelism to specify how many identical tasks should run at the same time to process a single job, making it perfect for splitting up large batch workloads.
As for your other questions:
- Does Cloud Run jobs support attaching GPUs? Yes. Cloud Run services and jobs both support GPUs.
- Is it a good idea to run ML Inference jobs using Cloud Run jobs? Yes. For batch inference , where you’re processing a bunch of data that doesn’t require an immediate, real-time response, Cloud Run jobs are a perfect fit. Using jobs allows you to process large datasets in parallel, leveraging GPUs for acceleration, and then the instances shut down automatically when the job is done. For real-time inference serving a web application, though, you’d want to stick with a Cloud Run service. This post might be able to help you.
Beautiful, very well explained thank you so much
@mcbsalceda