We are using EXPORT DATA OPTIONS to export tables to newline delimited JSON files.
We are not able to export table rows correctly which any column with NULL value, column values with NULL value is getting removed from JSON.
A | B | C
-
- 3
4 NULL 6
{“A”: 1, “B”: 2, “C”:3}
{“A”: 4, “C”:6}
What we need :
{“A”: 1, “B”: 2, “C”:3}
{“A”: 4, “B”: None, “C”:6} or {“A”: 4, “B”: NULL, “C”:6}
Hi @souravsingh13 ,
Welcome to Google Cloud Community!
As mentioned by @Betjens in a similar case, There are two options to handle null values in exporting data in BigQuery to JSON format.
-
Calling directly from python using BigQuery client library
from google.cloud import bigquery
import json
client = bigquery.Client()
query = "select null as field1, null as field2"
query_job = client.query(query)
json_list = {}
for row in query_job:
json_row = {'field1':row[0],'field2':row[1]}
json_list.update(json_row)
with open('test.json','w+') as file:
file.write(json.dumps(json_list))
-
You can also use Apache Beam Dataflow with python and BigQuery to handle the null values
import apache_beam as beam
from apache_beam.io import BigQuerySource
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
def add_null_field(row, field):
if field!='skip':
row.update({field: row.get(field, None)})
return row
def run(argv=None, save_main_session=True):
parser = argparse.ArgumentParser()
parser.add_argument(
'--output',
dest='output',
required=True,
help='Output file to write results to.')
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
with beam.Pipeline(options=pipeline_options) as p:
(p
| beam.io.Read(beam.io.BigQuerySource(query='SELECT null as field1, null as field2'))
| beam.Map(add_null_field, field='skip')
| beam.Map(json.dumps)
| beam.io.Write(beam.io.WriteToText(known_args.output, file_name_suffix='.json')))
if __name__ == '__main__':
run()
I hope the above information is helpful.