Sentiment analysis with BigQuery data preparation and BigQuery pipelines

Understanding customer sentiment is vital for marketing teams, but unstructured reviews make large-scale analysis difficult. This blog post walks you through an end-to-end BigQuery workflow of performing sentiment analysis from raw product reviews to actionable sentiment insights using Google Cloud’s BigQuery Data Preparation and BigQuery Pipelines.

Why sentiment analysis?

Product reviews are a treasure trove of unstructured text data. By applying sentiment analysis, you can automatically determine the emotional tone (positive, negative, neutral) of these reviews at scale. This allows businesses to quickly grasp customer satisfaction levels, identify pain points, and prioritize areas for improvement.

Why BigQuery Data Preparation and Pipelines?

According to Google’s internal user research on AI and ML scenarios, data preparation is often seen as the single most painful stage in the AI/ML lifecycle. Analysts spend more time fighting with complex SQL and moving data between tools than they do on generating insights.

While a query editor allows for direct data manipulation, BigQuery Data Preparation and BigQuery Pipelines offer you significant advantages, particularly for complex tasks like sentiment analysis on semi-structured data:

  • Visual and Intuitive Data Transformation: Instead of writing long, complex SQL statements for every step, BigQuery Data Preparation provides a visual, no-code/low-code interface to clean, shape, and enrich data. You can:

  • Apply transformations in a point-and-click manner.

  • Preview the impact of each step in real time to reduce errors and improve accuracy.

This is particularly valuable for semi-structured data (e.g., JSON, nested columns) where manual SQL often becomes cumbersome.

  • Handling Semi-Structured Data with Ease: Modern datasets often include nested or hierarchical JSON—for example, API responses, clickstream logs, or event payloads embedded within a single column. Extracting and flattening this data into a usable relational format using raw SQL can be time-consuming and error-prone. BQ Data preparation simplifies this process by providing one-click JSON flattening. By automating flattening, BigQuery Data Preparation turns JSON data into analytics-ready tables in just a one click, eliminating the pain of manual SQL gymnastics.
  • Operationalizing and Automating Data Workflows: This is a key differentiator. BigQuery Pipelines allow you to string together data preparation steps and BigQuery ML model execution into a cohesive, automated workflow. Instead of manually running individual queries in a query editor, you can:
    • Schedule Runs: Set your pipelines to run on a regular schedule (e.g., daily, weekly) to process new data.
    • Handle Errors: Pipelines provide mechanisms for monitoring and managing errors, making it easier to maintain and troubleshoot your data processes.
  • Integration with BigQuery ML: The pipeline can integrate your cleansed data with BigQuery ML models. This means you’re not just preparing data; you’re directly feeding it into an AI-powered analysis engine within the same workflow.
  • Focus on Insights, Not Code: By visually defining data transformations and automating the pipeline, you can spend less time writing and debugging complex SQL queries and more time extracting valuable insights from the results.

In essence, BigQuery Data Preparation and BigQuery Pipelines provide an end-to-end solution for robust, scalable, and automated data engineering and analysis, especially crucial when dealing with valuable, yet often hidden, in semi-structured formats.

Step-by-step walkthrough

Let’s dive into the process using a hypothetical dataset of product reviews. Let’s assume that we have a table that contains product reviews with following schema:

Sample rows looks like this:

Now lets prepare our data using Data Preparation and build a BigQuery pipeline.

  1. Connect to a Model

Connect to a model using Bigquery Connections. Follow these steps to create it. For this use case, we have created the following connection. For demonstration, we have created a BigQuery connection to gemini-2.0-flash model.

  1. Setting up Your Data Pipeline

Now, we’ll create a new Data Pipeline to orchestrate our data preparation and sentiment analysis tasks.

  1. Data Transformations with BigQuery Data Preparation

Our first step in the pipeline is to prepare the raw product review data.

  • From the pipeline canvas, click Add task and select Data preparation.
  • In the Data preparation task details, click Edit data preparation. This will launch the BigQuery Data Preparation interface.
  • In the keyword search, locate your product reviews dataset. For this example, we’ll assume a table named products_with_reviews.
  • Click on the table and then Add as source.

Add product reviews table as a source to dataprep

  1. Cleaning and Transforming Review Data

When working with raw data, it’s common to find entire JSON objects stored inside a single column—such as a review column containing detailed feedback in nested format. While this is great for storage, it’s not ideal for analysis.

To unlock the value hidden inside, we first need to flatten the JSON so that each key-value pair becomes an accessible, top-level column. This makes it easy to apply transformations, run filters, and create meaningful insights without wrestling with complex SQL.

Once flattened, you can:

  • Transform and Clean Data: Standardize date formats, normalize text, or enrich fields.

  • Apply Filters Easily: For example, filter by review_date or rating without digging through nested structures.

  • Prepare for Advanced Analytics: Feed clean, structured data directly into dashboards or machine learning models.

With tools like BigQuery Data Preparation, flattening nested JSON becomes a point-and-click operation, saving hours of manual query writing.

  1. Saving the Cleaned Data

Once your data is cleaned and transformed, you need to save it to a new BigQuery table.

  • Click on Destination in the right-hand panel of BigQuery Data Preparation.
  • Name your new table, for example, cleansed_sentiments.
  • Click Save.
  • Finally, click Save in the BigQuery Data Preparation interface to save your data preparation steps.
  1. Integrating Sentiment Analysis with a BigQuery ML Model

Now that our data is clean, we can integrate the sentiment analysis. We’ll use a pre-trained BigQuery ML model for this.

  • Back in your Data Pipeline, click Add task and select Query.

  • In the query task details, click Edit query.
  • Select your cleansed_sentiments table as the input for this query.
  • Enter a BigQuery SQL query that utilizes a pre-trained sentiment analysis model. A common approach involves using the ML.GENERATE_TEXT function with the Gemini model. You can kick-start your ML.GENERATE_TEXT query by using Gemini in BigQuery using a prompt such as “Use the ML.GENERATE_TEXT function with a text-sentiment-analysis model to analyze sentiment in the cleansed_sentiments table”. Your query could look something like this:

  • Click Save to save your query.
  1. Running and Monitoring Your Pipeline

Once both tasks are configured, you can run your pipeline.

  • Click Run at the top of the pipeline canvas.
  • Monitor the execution status in the Executions tab. You’ll see the data preparation and query tasks run sequentially.
  1. Exploring the Results

After the pipeline successfully completes, you can explore your analyzed sentiment data.

  • Navigate to your new analyzed_sentiments table in BigQuery Studio.
  • Go to the Preview tab to see the product reviews with their associated sentiment labels (e.g., Neutral, Positive, Negative).
  • You can also explore the schema and other details in the Details tab.

With this automated pipeline, you’ve gone far beyond running ad-hoc queries—you’ve built a system that delivers continuous intelligence. Here’s how to turn your pipeline into real business impact:

1. Visualize the results

Connect your analyzed_sentiments table to Looker Studio and bring your insights to life. Build dashboards to:

  • Track sentiment trends over time.
  • Compare sentiment across different products or regions.
  • Quickly spot emerging issues or positive themes.

2. Go deeper than sentiment

Don’t stop at positive vs. negative scores. Refine your pipeline to extract key topics or features mentioned in reviews. For example, identify recurring themes like battery life, connectivity, or price in negative feedback. This adds context and makes your insights truly actionable.

3. Turn insights into action

Translate raw data into clear narratives for stakeholders. For instance:

“85% of reviews for the new speaker are positive, with customers praising sound quality. However, 10% highlight poor battery life as a major concern.”

This kind of insight doesn’t just inform—it guides marketing campaigns, product improvements, and customer success strategies.

The bottom line

By combining BigQuery Data Preparation and BigQuery Pipelines, you’ve created a scalable, automated workflow for extracting insights from semi-structured data like product reviews. The result? Faster decisions, better products, and happier customers.

12 Likes

Thank you for the article !

I am currently using ML.GENERATE_TEXT to generate insights, with my source being a BQ table. However, I have noticed that the output is not consistent each time I run the query. Despite trying to adjust the temperature and top-p parameters, this has not resolved the issue.

I am sharing a sample query with you and would greatly appreciate any insights or suggestions you may have regarding this scenario. Your expertise and assistance would be invaluable in helping me understand and address this inconsistency.

Thank you in advance for your help.

SELECT

ml_generate_text_result

FROM

ML.GENERATE_TEXT(

MODEL \`<>\`,

(

SELECT

CONCAT(‘You are the research agent. Create a three summarizes the client’s case history with Bank over the last 180 days …..etc’, string_agg( concat(Colum1, column2,column3),‘’)) AS prompt

FROM `Table 1` C

LEFT JOIN `Table2` CM ON C.CASE = CM.CASE

LEFT JOIN `Table 3` T ON C.ID = T.CHILD_ID

LEFT JOIN `Table 4` Q

ON C.CASE = Q.CASE AND Q.DT = ‘2025-08-12’

LEFT JOIN `Table 5` E ON C.CASE = E.CASE and ASSIGN IN (“Esc”, “Oper”)

WHERE

C.ID = ‘71’

OR C.ID IN (

SELECT CHILD_ID

FROM (

  SELECT CHILD_ID,

         ROW_NUMBER() OVER (PARTITION BY CHILD_ID ORDER BY SNAPSHOT_DT DESC) AS rn

  FROM \`Table 6\`

  WHERE ULT \_ID = '71'

) sub

WHERE rn = 1

)

and

date(substr(C. DT,0,10)) > “2024-12-31”

--ORDER BY C. DT DESC

),

STRUCT(0.5 AS temperature,

  1024 AS max_output_tokens,

  0.8 AS top_p,

  40 AS top_k )

);

Wow, this is super easy to follow! I didn’t realize data prep could be this tricky if done manually. The visual tools and pipelines in BigQuery seem like they’d make analyzing data way easier. Really cool!

2 Likes