Yes, you are correct. In order to run the pipeline as described, you would need to create a custom template.
Dataflow supports two types of templates: Flex templates and classic templates.
Classic templates contain a JSON serialization of a Dataflow job graph and require runtime parameters to be wrapped in the ValueProvider
interface. This interface allows users to specify parameter values when they deploy the template.
Flex templates, on the other hand, package the pipeline as a Docker image, along with a template specification file in Cloud Storage. When you run the template, the Dataflow service starts a launcher VM, pulls the Docker image, and runs the pipeline. The execution graph is dynamically built based on runtime parameters provided by the user.
There are several advantages of Flex templates over classic templates:
- Flex templates do not require the
ValueProvider
interface for input parameters. Not all Dataflow sources and sinks supportValueProvider
. - While classic templates have a static job graph, Flex templates can dynamically construct the job graph. For example, the template might select a different I/O connector based on input parameters.
- Flex templates can perform preprocessing on a virtual machine (VM) during pipeline construction. For example, it might validate input parameter values.
Given these advantages, if you are creating a new Dataflow template, it is recommended to create it as a Flex template.
For more details see:
https://cloud.google.com/dataflow/docs/concepts/dataflow-templates