I am debugging a query - Expected end of input but got keyword CROSS at [118:5]

sql_query = “”"
CREATE TABLE {prodid}.{datasetid}.{tableid}
PARTITION BY DATE(event_timestamp)
CLUSTER BY metric_name
AS(
SELECT CURRENT_TIMESTAMP() AS last_updated, ga_session_id,
IF(
EXISTS(SELECT 1 FROM UNNEST(events) AS e
WHERE e.event_name = ‘first_visit’),‘New user’, ‘Returning user’) AS user_type,
IF((
SELECT
MAX(session_engaged) FROM UNNEST(events)) > 0, ‘Engaged’, ‘Not engaged’) AS session_engagement, evt.* EXCEPT (session_engaged, event_name),
event_name AS metric_name,
#1 START
CASE metric_rating
WHEN ‘good’ THEN ‘Good’
WHEN ‘ni’ THEN ‘Needs Improvement’
WHEN ‘needs-improvement’ THEN ‘Needs Improvement’
WHEN ‘poor’ THEN ‘Poor’
ELSE metric_rating
END AS metric_status
#1 END
FROM (
SELECT ga_session_id, ARRAY_AGG(custom_event) AS events
FROM(
SELECT ga_session_id,
STRUCT(
country,
call_timestamp,
call_sequence,
page_timestamp,
device_category,
device_os,
traffic_medium,
traffic_name,
traffic_source,
page_path,
#support long debug_target
IF(debug_target2 IS NULL, debug_target, CONCAT(debug_target, debug_target2)) AS debug_target,
event_timestamp,
event_date,
event_name,
metric_id,
#support TTFB and FCP
IF(event_name = ‘LCP’ OR event_name = ‘TTFB’ OR event_name = ‘FCP’, metric_value / 1000, metric_value) AS metric_value,
user_pseudo_id,
session_engaged,
session_revenue,
#2 START
metric_rating,
page_location,
page_type,
continent,
region,
device_browser,
effective_connection_type,
save_data,
width,
height
#2 END
) AS custom_event
FROM (
SELECT(
SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘ga_session_id’) AS ga_session_id, # can be null in consent mode
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘metric_id’) AS metric_id,
SAFE.TIMESTAMP_MILLIS(ANY_VALUE((SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘call_timestamp’))) AS call_timestamp, # not sure what use this has. We also have event_timestamp
ANY_VALUE((SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘call_sequence’)) AS call_sequence, # not sure what use this has especially when set to ANY_VALUE
SAFE.TIMESTAMP_MILLIS(ANY_VALUE((SELECT CAST(COALESCE(value.double_value, value.int_value) AS INT64) FROM UNNEST(event_params) WHERE key = ‘page_timestamp’))) AS page_timestamp,
ANY_VALUE(device.category) AS device_category,
ANY_VALUE(device.operating_system) AS device_os,
ANY_VALUE(traffic_source.medium) AS traffic_medium,
ANY_VALUE(traffic_source.name) AS traffic_name,
ANY_VALUE(traffic_source.source) AS traffic_source,
ANY_VALUE(
REGEXP_SUBSTR(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘page_location’),
r’^[^?]+')) AS page_path,
ANY_VALUE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘debug_target’)) AS debug_target,

support long debug_target values (over 100 characters)

ANY_VALUE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘debug_target2’)) AS debug_target2,
ANY_VALUE(user_pseudo_id) AS user_pseudo_id,
ANY_VALUE(geo.country) AS country,
ANY_VALUE(event_name) AS event_name,
SUM(ecommerce.purchase_revenue) AS session_revenue,
MAX(
(SELECT
COALESCE(value.double_value, value.int_value, CAST(value.string_value AS NUMERIC))
FROM UNNEST(event_params)
WHERE key = ‘session_engaged’)) AS session_engaged,TIMESTAMP_MICROS(MAX(event_timestamp)) AS event_timestamp,
MAX(PARSE_DATE(‘%Y%m%d’, event_date)) AS event_date, MAX(
(SELECT COALESCE(value.double_value, value.int_value)
FROM UNNEST(event_params)
WHERE key = ‘metric_value’)) AS metric_value,
#3 START
ANY_VALUE((SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘metric_rating’)) AS metric_rating,
ANY_VALUE((SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘page_location’)) AS page_location,
ANY_VALUE((SELECT COALESCE(value.string_value, CAST(value.int_value AS STRING)) FROM UNNEST(event_params) WHERE key = ‘page_type’)) AS page_type,
ANY_VALUE(geo.continent) AS continent,
ANY_VALUE(geo.region) AS region,
ANY_VALUE(device.web_info.browser) AS device_browser,
ANY_VALUE((SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘effective_connection_type’)) AS effective_connection_type,
ANY_VALUE((SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘save_data’)) AS save_data,
ANY_VALUE((SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘width’)) AS width,
ANY_VALUE((SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘height’)) AS height
#3 END
FROM prodID.datasetid.events_*
#Replace DatasetID with your Dataset ID
WHERE event_name IN (‘LCP’, ‘FID’, ‘CLS’, ‘TTFB’, ‘FCP’, ‘INP’, ‘first_visit’, ‘purchase’)
GROUP BY 1, 2
)
)
WHERE ga_session_id IS NOT NULL
GROUP BY ga_session_id # can be null in consent mode ?
)
)
CROSS JOIN UNNEST(events) AS evt
WHERE evt.event_name NOT IN (‘first_visit’, ‘purchase’)
“”"
df = client.query(sql_query).to_dataframe()

The error message “Expected end of input but got keyword CROSS at [118:5]” indicates that there is a syntax issue in your SQL query, specifically around the use of the CROSS JOIN clause. Let’s analyze the structure of your query to identify and correct the problem.

From the structure of your query, it seems that the CROSS JOIN UNNEST(events) AS evt is intended to flatten the array of events for each session. However, the placement of this clause might be causing the syntax error.

In BigQuery SQL, a CROSS JOIN should be used within the FROM clause of a SELECT statement. It looks like your CROSS JOIN is outside of any SELECT statement, which is likely causing the syntax error.

To fix this, you need to ensure that the CROSS JOIN is part of a SELECT statement. Here’s a revised version of your query with the CROSS JOIN correctly placed:


CREATE TABLE `{prodid}.{datasetid}.{tableid}`
PARTITION BY DATE(event_timestamp)
CLUSTER BY metric_name
AS (
SELECT 
CURRENT_TIMESTAMP() AS last_updated, 
ga_session_id,
IF(EXISTS(SELECT 1 FROM UNNEST(events) AS e WHERE e.event_name = 'first_visit'), 'New user', 'Returning user') AS user_type,
IF((SELECT MAX(session_engaged) FROM UNNEST(events)) > 0, 'Engaged', 'Not engaged') AS session_engagement, 
evt.* EXCEPT (session_engaged, event_name),
event_name AS metric_name,
CASE metric_rating
WHEN 'good' THEN 'Good'
WHEN 'ni' THEN 'Needs Improvement'
WHEN 'needs-improvement' THEN 'Needs Improvement'
WHEN 'poor' THEN 'Poor'
ELSE metric_rating
END AS metric_status
FROM (
SELECT 
ga_session_id, 
ARRAY_AGG(custom_event) AS events
FROM (
-- Your inner SELECT statements here
)
WHERE ga_session_id IS NOT NULL
GROUP BY ga_session_id
)
CROSS JOIN UNNEST(events) AS evt
WHERE evt.event_name NOT IN ('first_visit', 'purchase')
)

In this revised query, the CROSS JOIN UNNEST(events) AS evt is now correctly placed within the FROM clause of the outermost SELECT statement. This should resolve the syntax error you were encountering.

2 Likes

Thanks very much for this! works wonderfully :heart: