Bug in Python BigQuery API with adding policy tags to a table when using WRITE_TRUNCATE

Hi there

We got an issue with a python Cloud Function that writes data to a BigQuery table using the google-cloud-bigquery Python API.

First to give you some context:

We got a Python 3.9 Cloud Function running to read some data from a spreadsheet and to put that data into a BigQuery table. Since the data that’s collected from the spreadsheet is very sensitive, we would like to add Data Catalog Policy Tags to the BigQuery table to make sure the data isn’t visible to all the developers of our organisation. To add those policy tags, we’ve added them to some specific fields in our SchemaField in the JobConfig in our Python code. To write the data to the table we use the WRITE_TRUNCATE option to overwrite the existing data and add the new data to the table.The problem we have is the following:
Every time we run the Cloud Function, the Policy Tags are completely removed from every field in the table and we have to manually put them back to secure our sensitive data. After days of debugging and research we found that it has probably to do with the fact that WRITE_TRUNCATE overwrites the complete table before adding the new data. This shouldn’t be a problem if the Cloud Function adds them back after adding the data but this never happens. It looks like this is a known issue and that we are not the only ones who have this problem. Here you can find a blogpost on StackOverflow with some developers who have the same problem: https://stackoverflow.com/questions/64440753/bigqueryoperator-changes-the-table-schema-and-column-modes-when-write-dispositio

For now we first Truncate all our data and then add our new data with the WRITE_APPEND option since this does not overwrites the whole table, just add the data. But this off course is not ideal because if the Cloud Function crashes or times-out after truncating all the data but did not have the chance to add the new data back, our table will be left empty until the next run which is far from ideal.

Is there any way we can get some help to find a good and working solution to this problem? And if anyone from Google reads this, is it possible to check if the problem lies with the backend code/API from Google?

Thanks in advance

3 Likes

Yes, it’s a known issue. There is an issue created already in the Public Issue Tracker [1], which you can follow for further update. I did not find any workaround as such.

[1] https://issuetracker.google.com/187435090

1 Like

Thanks a lot for the answer. I wasn’t aware there was already an issue made about this bug. I set some mail notifications to keep me updated on the issue. Since there is no other workaround we will keep using our solution until the issue is fixed.

1 Like

Thanks. Happy New Year 2022.

1 Like