I am using argument settings to populate as many settings in my pipeline as possible. The pipeline consumes files from GCS, processes with Wrangler, and writes out to GCS and BigQuery.
I am able to use the macros everywhere EXCEPT in the source GCS. If I try to set a macro for “format” or “schema”, I get a null pointer exception error. It is no problem in the sink GCS steps. It definitely appears to be a bug.
Here is the relevant settings in my arguments JSON:
{
“arguments”: [
{
“name”: “gcs.source.path”,
“value”: “gs://org-usssa-dw-data-lake-dev/source/legacy/athleteAward.csv”
},
{
“name”: “gcs.source.format”,
“value”: “delimited”
},
{
“name”: “gcs.source.delimiter”,
“value”: “~”
},
{
“name”: “gcs.source.schema”,
“type”: “schema”,
“value”: [
{ “name”: “mvp_id”, “type”: “int”, “nullable”: true },
{ “name”: “player_id”, “type”: “int”, “nullable”: true },
{ “name”: “tournament_id”, “type”: “int”, “nullable”: true },
{ “name”: “mvp_name”, “type”: “string”, “nullable”: true },
{ “name”: “StatureID”, “type”: “int”, “nullable”: true },
{ “name”: “season_id”, “type”: “int”, “nullable”: true }
]
},
]
}