Workflow failed. Causes: The minimum amount of memory of a Dataflow worker instance is 1740 MB. The machine type selected (f1-micro) only has 614 MB of memory.
In my defense, the UI allowed me to select that machine type.
The error message indicates that the Dataflow worker instance must have at least 1740 MB of memory, but the f1-micro machine type only has 614 MB of memory.
To fix the error, you need to use a machine type with at least 1740 MB of memory. The default machine type for Dataflow jobs is n1-standard-1, which has 3.75 GB of memory. You can also consider other machine types, such as n1-standard-2 (7.5 GB of memory) or n1-standard-4 (15 GB of memory).
To change the machine type for your Dataflow job, use the --workerMachineType option when submitting your job. For example, to use the n1-standard-1 machine type:
gcloud dataflow jobs run my_job --workerMachineType=n1-standard-1
In Python, you can specify the machine type in your Dataflow pipeline code:
from apache_beam.options.pipeline_options import PipelineOptions
pipeline = beam.Pipeline(options=PipelineOptions(worker_machine_type="n1-standard-1"))
Once you’ve adjusted the machine type for your Dataflow job, you should be able to submit it successfully.
Additional tips for selecting the right machine type:
Data size and complexity: If you’re processing a large dataset or running a complex pipeline, you may need to use a machine type with more memory.
Worker count: Dataflow automatically scales the number of workers as needed. However, for a large number of workers, a machine type with more memory might be beneficial.
Budget: Dataflow charges based on the amount of CPU and memory that your job uses. If you’re on a tight budget, you may want to use a smaller machine type.