Hi everyone,
I’ve spent way too much time manually writing JSON_VALUE, UNNEST, and SAFE_CAST queries just to get nested event data into a usable state in BigQuery. It feels like 90% of my data engineering time is just fixing broken pipelines when a schema changes.
So my team and I built a tool called Forge to automate the messy part.
What it does:
- Automated Normalization: It takes raw, nested JSON (webhooks, event streams) and automatically flattens it into relational tables.
- Handles Schema Drift: If a new field is added to the source, Forge detects it and updates the table schema automatically instead of breaking the pipeline.
- Generates dbt Code: It runs on dbt Core and generates the actual SQL/models for you, so you get full lineage and docs without writing the boilerplate yourself. Dbt docs are visible with each run, so you have a 360 degree view of all of your data.
We focused heavily on transparency—you can inspect the generated SQL for every table, so it’s not a black box.
Would love any feedback on the generated models or how you’re currently handling JSON schema evolution!


