We are designing a generic audit framework in BigQuery to support multiple event types where searchable identifiers vary per audit type.
Table Contract (High Level)
audits (Immutable Source of Truth)
Purpose:
Stores complete audit event including metadata and raw payload. No updates allowed (append-only).
Contract:
-
One row per event
-
event_idmust be globally unique -
org_idis mandatory -
event_timestampis mandatory and used for partitioning -
payloadcontains full JSON event data -
Data is immutable after insertion
Physical Design (BigQuery):
-
Partitioned by:
DATE(event_timestamp) -
Clustered by:
org_id,event_type
audit_search_identifiers (Search/Index Table)
Purpose:
Stores extracted, queryable identifiers to enable efficient filtering without scanning payload JSON.
Contract:
-
One row per identifier per event
-
event_idreferencesaudits.event_id -
org_idmust match parent event -
identifier_name- name of the search identifier -
identifier_string_value- string value if the identifier is string type -
identifier_numeric_value - numeric value if the identifier is number/int/double -
identifier_timestamp_value - timestamp val -
event_timestampcopied from audits for partition pruning -
Only one typed identifier value column should be populated per row
Physical Design (BigQuery):
-
Partitioned by:
DATE(event_timestamp) -
Clustered by:
org_id,identifier_name
From a BigQuery performance and cost perspective:
-
Is this two-table contract (audits + search identifiers) an efficient and scalable model for high-volume, multi-identifier audit events?
-
Would you recommend:
-
A single wide table,
-
Nested/repeated identifiers inside audits,
-
Or any alternative structure?
-