We are facing two issues in a specific Dataflow pipeline. First, the pipeline is not generating logs at any stage of Apache Beam: neither the logs from the processing itself nor the ones I manually inserted into the steps. Additionally, we do not have information about input and output elements in the “Step Information” tab. However, we know the data is arriving, as the “Data Update by Steps” chart confirms the data is being received.
The second issue relates to sending data to the Pub/Sub topic. The “Push” subscription is able to receive messages, but the subscription configured to send data to BigQuery is not working. Some messages are being redirected to the Dead Letter topic, which also has a subscription for the same table as the original topic, aiming to ensure that the messages reach the database. However, the ideal scenario would be to have a single topic with a fully functional subscription.
Regarding the integration of the Dataflow pipeline with Pub/Sub delivery, although the data is theoretically formatted for delivery, we cannot be entirely sure that the formatting is correct, as there is no confirmation of arrival—neither through logs, the topic itself, nor the subscription.
Multiple issues on your Dataflow pipeline based on Apache Beam can stem from various reasons. Here are some suggestions that may help resolve the issues in your Dataflow pipeline:
Issue #1: Pipeline not generating Logs:
Log Exclusion Filter: Ensure that there are no exclusion filters containing resource.type=“dataflow_step” in your logs router.
_Default sink: Ensure that _Default sink is enabled in Cloud Logging.
Log Levels: Check if worker log levels are properly configured in your pipeline code and ensure your log messages are set at the appropriate level.
Roles and Permissions: Ensure your service account running the Dataflow jobs has a proper IAM role that can write log entries. You can consider using roles/logging.logWriter.
Dead-letter Topic: Since some of your messages were being redirected to a dead-letter topic, this possibly suggests that there might be integration issues between your Pub/Sub and BigQuery. Ensure that there are no data format issues or schema mismatches between Pub/Sub data and BigQuery and use consistent field names and data types.
If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.