Null pointer exception when using macros in source GCS

I am using argument settings to populate as many settings in my pipeline as possible. The pipeline consumes files from GCS, processes with Wrangler, and writes out to GCS and BigQuery.

I am able to use the macros everywhere EXCEPT in the source GCS. If I try to set a macro for “format” or “schema”, I get a null pointer exception error. It is no problem in the sink GCS steps. It definitely appears to be a bug.

Here is the relevant settings in my arguments JSON:

{
“arguments”: [
{
“name”: “gcs.source.path”,
“value”: “gs://org-usssa-dw-data-lake-dev/source/legacy/athleteAward.csv”
},
{
“name”: “gcs.source.format”,
“value”: “delimited”
},
{
“name”: “gcs.source.delimiter”,
“value”: “~”
},
{
“name”: “gcs.source.schema”,
“type”: “schema”,
“value”: [
{ “name”: “mvp_id”, “type”: “int”, “nullable”: true },
{ “name”: “player_id”, “type”: “int”, “nullable”: true },
{ “name”: “tournament_id”, “type”: “int”, “nullable”: true },
{ “name”: “mvp_name”, “type”: “string”, “nullable”: true },
{ “name”: “StatureID”, “type”: “int”, “nullable”: true },
{ “name”: “season_id”, “type”: “int”, “nullable”: true }
]
},
]
}

1 Like

It seems that it responds to publicly accessible objects. Can try setting your GCS in to public? For more information you can view the documentation here:

https://cloud.google.com/data-fusion/docs/tutorials/reusable-pipeline#set-macros

1 Like

I created a new bucket and made it public. I uploaded my arg settings files and pointed the data pipeline arg settings plugin to the new location. I still get the same results.

Understand that the arg settings were working fine when the bucket wasn’t public. The problem is only when attempting to use arguments/macros on the GCS source, and only on the “format” and “schema”.