Built a tool to automate JSON parsing & normalization in BigQuery (generates dbt models)

Hi everyone,

I’ve spent way too much time manually writing JSON_VALUE, UNNEST, and SAFE_CAST queries just to get nested event data into a usable state in BigQuery. It feels like 90% of my data engineering time is just fixing broken pipelines when a schema changes.

So my team and I built a tool called Forge to automate the messy part.

What it does:

  • Automated Normalization: It takes raw, nested JSON (webhooks, event streams) and automatically flattens it into relational tables.

  • Handles Schema Drift: If a new field is added to the source, Forge detects it and updates the table schema automatically instead of breaking the pipeline.

  • Generates dbt Code: It runs on dbt Core and generates the actual SQL/models for you, so you get full lineage and docs without writing the boilerplate yourself. Dbt docs are visible with each run, so you have a 360 degree view of all of your data.

We focused heavily on transparency—you can inspect the generated SQL for every table, so it’s not a black box.

Would love any feedback on the generated models or how you’re currently handling JSON schema evolution!