BigQuery Data Model for Audit System

We are designing a generic audit framework in BigQuery to support multiple event types where searchable identifiers vary per audit type.

Table Contract (High Level)

:one: audits (Immutable Source of Truth)

Purpose:
Stores complete audit event including metadata and raw payload. No updates allowed (append-only).

Contract:

  • One row per event

  • event_id must be globally unique

  • org_id is mandatory

  • event_timestamp is mandatory and used for partitioning

  • payload contains full JSON event data

  • Data is immutable after insertion

Physical Design (BigQuery):

  • Partitioned by: DATE(event_timestamp)

  • Clustered by: org_id, event_type


:two: audit_search_identifiers (Search/Index Table)

Purpose:
Stores extracted, queryable identifiers to enable efficient filtering without scanning payload JSON.

Contract:

  • One row per identifier per event

  • event_id references audits.event_id

  • org_id must match parent event

  • identifier_name - name of the search identifier

  • identifier_string_value - string value if the identifier is string type

  • identifier_numeric_value - numeric value if the identifier is number/int/double

  • identifier_timestamp_value - timestamp val

  • event_timestamp copied from audits for partition pruning

  • Only one typed identifier value column should be populated per row

Physical Design (BigQuery):

  • Partitioned by: DATE(event_timestamp)

  • Clustered by: org_id, identifier_name

From a BigQuery performance and cost perspective:

  1. Is this two-table contract (audits + search identifiers) an efficient and scalable model for high-volume, multi-identifier audit events?

  2. Would you recommend:

    • A single wide table,

    • Nested/repeated identifiers inside audits,

    • Or any alternative structure?