Data rarely arrives in the perfect format. As data professionals, we spend a huge amount of our time cleaning, parsing, and transforming messy data into a structured, usable state. In BigQuery SQL, this often involves applying multiple functions to a single piece of data, which can lead to deeply nested queries that are difficult to read, debug, and maintain.
But what if there was a more intuitive way? Let’s explore a complex, real-world scenario and see how chained function calls can transform a tangled mess into an elegant, readable pipeline.
The scenario: Parsing a string field JSON Payload
Imagine we have a table with a single text column, event_payload, which contains a full JSON object stored as a string. Our goal is to parse this string to:
Extract the user_id from a messy, nested string.
Clean, standardize, and alphabetize a list of user-generated tags.
Find the timestamp of the very last user action from a nested array.
Here’s our raw data in JSON:
{ “event_id”: “evt_123”,
“user_data”:" email: john.doe@example.com, user_id: usr_456 ",
“actions”: \[ {“type”: “view”, “timestamp”: “2025-08-14T14:10:00Z”}, {“type”: “click”, “timestamp”: “2025-08-14T14:12:30Z”} \],
“raw_tags”: " data-science, BigQuery , SQL , analysis, "
}
The “hard way”: A tangle of nested functions
Without chained functions, we’re forced to nest function calls one inside the other. This requires reading the logic from the inside out and can quickly become a headache.
WITH raw_events AS (
SELECT
“”"
{
“event_id”: “evt_123”,
“user_data”: " email: john.doe@example.com, user_id: usr_456 “,
“actions”: \[
{“type”: “view”, “timestamp”: “2025-08-14T14:10:00Z”},
{“type”: “click”, “timestamp”: “2025-08-14T14:12:30Z”}
\],
“raw_tags”: " data-science, BigQuery , SQL , analysis, "
}
“”” AS event_payload
UNION ALL
SELECT
“”"
{
“event_id”: “evt_124”,
“user_data”: “user_id: usr_789”,
“actions”: \[
{“type”: “purchase”, “timestamp”: “2025-08-14T15:00:00Z”}
\],
“raw_tags”: “sql, a/b testing, conversion”
}
“”"
UNION ALL
SELECT
“”"
{
“event_id”: “evt_125”,
“user_data”: “email: jane.doe@example.com”,
“actions”: \[
{“type”: “view”, “timestamp”: “2025-08-14T16:20:00Z”},
{“type”: “view”, “timestamp”: “2025-08-14T16:21:00Z”},
{“type”: “abandon_cart”, “timestamp”: “2025-08-14T16:25:00Z”}
\],
“raw_tags”: " , marketing, campaign, "
}
“”"
),
cleaned_tags AS (
SELECT
JSON_EXTRACT_SCALAR(event_payload, ‘$.event_id’) AS event_id,
LOWER(TRIM(raw_tag)) AS cleaned_tag
FROM
raw_events,
UNNEST(
SPLIT(
TRIM(
JSON_EXTRACT_SCALAR(event_payload, ‘$.raw_tags’)
),
‘,’
)
) AS raw_tag
WHERE
LOWER(TRIM(raw_tag)) != ‘’
)
SELECT
JSON_EXTRACT_SCALAR(e.event_payload, ‘$.event_id’) AS event_id,
REGEXP_EXTRACT(
JSON_EXTRACT_SCALAR(e.event_payload, ‘$.user_data’),
r’user_id:\\s\*(\\w+)’
) AS user_id,
(
SELECT MAX(JSON_EXTRACT_SCALAR(action, ‘$.timestamp’))
FROM UNNEST(JSON_QUERY_ARRAY(e.event_payload, ‘$.actions’)) AS action
) AS last_action_ts,
ARRAY_TO_STRING(ARRAY_AGG(t.cleaned_tag ORDER BY t.cleaned_tag), ', ') AS clean_tags
FROM raw_events AS e
LEFT JOIN cleaned_tags AS t
ON JSON_EXTRACT_SCALAR(e.event_payload, ‘$.event_id’) = t.event_id
GROUP BY
e.event_payload;
This query works, but it’s not friendly. The deeply nested logic in the cleaned_tags CTE is particularly confusing.
The “clean way”: A readable pipeline with chained functions
Now, let’s refactor the exact same query using chained functions. The logic is identical, but the query is written as a clear, step-by-step pipeline that is read from left to right.
WITH raw_events AS (
SELECT
“”"
{
“event_id”: “evt_123”,
“user_data”: " email: john.doe@example.com, user_id: usr_456 “,
“actions”: \[
{“type”: “view”, “timestamp”: “2025-08-14T14:10:00Z”},
{“type”: “click”, “timestamp”: “2025-08-14T14:12:30Z”}
\],
“raw_tags”: " data-science, BigQuery , SQL , analysis, "
}
“”” AS event_payload
UNION ALL
SELECT
“”"
{
“event_id”: “evt_124”,
“user_data”: “user_id: usr_789”,
“actions”: \[
{“type”: “purchase”, “timestamp”: “2025-08-14T15:00:00Z”}
\],
“raw_tags”: “sql, a/b testing, conversion”
}
“”"
UNION ALL
SELECT
“”"
{
“event_id”: “evt_125”,
“user_data”: “email: jane.doe@example.com”,
“actions”: \[
{“type”: “view”, “timestamp”: “2025-08-14T16:20:00Z”},
{“type”: “view”, “timestamp”: “2025-08-14T16:21:00Z”},
{“type”: “abandon_cart”, “timestamp”: “2025-08-14T16:25:00Z”}
\],
“raw_tags”: " , marketing, campaign, "
}
“”"
),
cleaned_tags AS (
SELECT
(event_payload).JSON_EXTRACT_SCALAR(‘$.event_id’) AS event_id,
(raw_tag).TRIM().LOWER() AS cleaned_tag
FROM
raw_events,
UNNEST((event_payload).JSON_EXTRACT_SCALAR(‘$.raw_tags’).TRIM().SPLIT(‘,’)) AS raw_tag
WHERE
(raw_tag).TRIM().LOWER() != ‘’
)
SELECT
(e.event_payload).JSON_EXTRACT_SCALAR(‘$.event_id’) AS event_id,
(e.event_payload).JSON_EXTRACT_SCALAR(‘$.user_data’).REGEXP_EXTRACT(r’user_id:\\s\* (\\w+)') AS user_id,
(
SELECT MAX(JSON_EXTRACT_SCALAR(action, ‘$.timestamp’))
FROM UNNEST(JSON_QUERY_ARRAY(e.event_payload, ‘$.actions’)) AS action
) AS last_action_ts,
ARRAY_TO_STRING(ARRAY_AGG(t.cleaned_tag ORDER BY t.cleaned_tag), ‘, ‘) AS clean_tags
FROM raw_events AS e
LEFT JOIN cleaned_tags AS t
ON (e.event_payload).JSON_EXTRACT_SCALAR(’$.event_id’) = t.event_id
GROUP BY e.event_payload;
The difference is night and day. The flow of data is explicit and intuitive.
The verdict: Why chained functions are better
Both queries produce the exact same result, but the chained version is vastly superior for several key reasons:
Linear readability: Chained functions are read from left to right, mirroring how we read a sentence. You start with your data and apply transformations one after another. Nested functions force you into an unnatural “inside-out” reading order.
Easier debugging: When a chained query fails, it’s much easier to pinpoint the problematic step. You can simply comment out parts of the chain from right to left to see where the logic breaks. Debugging a nested query often involves painstakingly checking each set of parentheses.
Simplified maintenance: If you need to add, remove, or change a step in your transformation, chaining makes it trivial. With nesting, the same change requires carefully rewriting the function calls and rearranging parentheses, which is highly error-prone.
Reduced cognitive load: Ultimately, chained functions are just easier on the brain. They reduce the mental effort required to understand the flow of data, allowing you and your teammates to grasp the query’s intent much faster.
By adopting chained function calls for your complex transformations, you’re not just writing SQL—you’re crafting clean, maintainable, and collaborative data pipelines.