How to de-identify data before logging to BigQuery Table

Hi Everyone,

Could anyone help me on how can I de-identify PII ( Personal information like Credit Card number) before getting logged to big Query table .

I have been thinking of creating a pub/sub topic to get triggered on every conversation request which will trigger a cloud function for de-identifying the user and agent Utterances and insert to BigQuery Table , but unable to find a way to trigger event on every conversation.

Please provide any resources or documents or the same.

Thanks.

I would recommend to check Google’s Cloud Data Loss Prevention API:

Fully managed service designed to help you discover, classify, and protect your most sensitive data.

Also here is a documentation discussing a pipeline de identification of large datasets that might be helpful for your use case.

https://cloud.google.com/architecture/de-identification-re-identification-pii-using-cloud-dlp