Add 'include as CTE' boolean to joins

Problem
We have star model explores (start with fact) and “configuration” model explores (start with dimension)
The reason is that you want to see all configured e.g. suppliers even if they have 0 rows in the fact. Users then want to use this explore to always see all suppliers, regardless if they had 0 orders in a selected period.

E.g. from: supplier
left join: orders

So you want to filter on orders join specifically, e.g. only orders last x days.
Right now, we need to make a derived table with templated filters to do this.

Solution
But the solution seems very easy:
add a boolean field to the view join that says “include as CTE” where filters applied to its dimensions are added inside the CTE instead of at the bottom of the entire SQL query.

Any other ideas?
Ofc in each star model, I could add to each fact table an empty row for each dimension we join, but that seems messy too.

3 Likes

The simplest approach here would be to just build an explore with your suppliers table as the base table, and join the other relevant tables to that.

But then if someone filters on a dimension of orders. The suppliers without orders will dissappear. And we specifically want to keep seeing all suppliers and have a 0 for orders.

Gotcha. How would you write that query in SQL?

So what we do now is

view: orders_per_supplier_paid_period {
derived_table: {
sql:
select
supplier_id,
coalesce(count(distinct order_id),0) AS num_orders
from orders
where {% condition filter_paid_period %} paid_at {% endcondition %}
group by 1
}

measure: num_orders { }

explore: suppliers
from: suppliers

join: orders_per_supplier_paid_period {
type: left_outer
relationship: one_to_one
}

Then if you select supplier_id and num_orders your SQL will turn out:
select
suppliers.id,
sum(orders_per_supplier_paid_period.num_orders)
FROM suppliers
LEFT JOIN (
select
supplier_id,
coalesce(count(distinct order_id),0) AS num_orders
from orders
where {% condition filter_paid_period %} paid_at {% endcondition %}
group by 1
) as orders_per_supplier_paid_period on orders_per_supplier_paid_period.supplier_id = suppliers.id

It’s great as you can filter on the paid period of the orders, but you still see all suppliers because the derived table gets wrapped in a CTE ()/subquery.

But for each dimension you want to add, you need to add a conditional filter in the derived table—a lot of work.

What would just be as easy is:
explore: suppliers
from: suppliers

join: orders (no derived table needed)
type: left_outer
relationship: one_to_one/one_to_many
include_as_cte: yes
)

Which would generate the same SQL as above.

Edit: I see an issue with this approach as what do you do if someone selects dimensions from the join. I think still you would want to allow this behaviour, which does lead to a fanout. But If you don’t want that then you shouldn’t allow dimensions in the join.
This is also why i said it can be one_to_one one_to_many, depending on how many dimensions you let people select from the join view.

Ran in to the same issue and would love to hear if you find a workaround for this!

Nope nothing yet. We now just rely on the ‘limited’ liquid derived table joins.

You could do a merge result.
From a explore with all suppliers, and then merge in some order information already aggregated to supplier_id. But merge results are tough to work with.

Hey folks, thanks for the post, checking if I understand correctly, you want the where condition to not be at the end of the query? Wouldn’t adding it to the join solve this?

We can do this by creating a lookml filter field and passing that through in your sql_on join:

e.g.
filter: date_filter {
type: date_time
}

sql_on: ${order_items.user_id} = ${users.id} and {% condition date_filter %} users.created_at {% endcondition %} ;;
}

Yeah but if I want to filter my orders by more than just 1 dimension this is not scalable. You end up with something similar to my derived table where you have to define this for each dimension, rather than it working automatically.

E.g.

  • created_at
  • country name
  • city name
  • demand channel
  • source
1 Like

That makes sense. Maybe you could create multiple CTEs and inject them in with a similar approach as this: https://www.googlecloudcommunity.com/gc/Technical-Tips-Tricks/Aggregate-Awareness-using-in-query/ta-p/587376 but again it’s not extremely scalable for you but depending on the use case it might help.

The proposed request to add the parameter include_as_cte: yes would be the best.
Would be nice to not have to write a full screen worth of code as yet another workaround.

The issue with include_as_cte: yes would be that it’s going to cause fanouts as you would have no way of controlling the granularity of the CTE i.e. users dimensions would be added on the fly based on their selections. I think the only way it would work is if you only allowed dimensions in the join to be added to the CTE.

You can raise a FR from within Looker (Question mark top right hand corner and then product idea), this would be the best course of action and then following up with your account representative.

Yes true, it just means that an explore user has to know that adding a dimension leads to a fanout… but they need to know this in any case when you make an explore that has a one_to_many join. That’s not what this feature request would solve, but it would solve the filtering aspect of it.

The one_to_many issue should be solved with more feedback towards users on what they are doing. Or as we try, to avoid one_to_many all together, but for some questions its unavoidable.

Ill raise it as FR.

An explore user would not need to know this as symmetric aggregates in Looker avoid the fanout issue assuming the model is correctly set up by the developer. (https://cloud.google.com/looker/docs/best-practices/understanding-symmetric-aggregates).

A FR is definitely the best course of action :slightly_smiling_face:

Conceptually I don’t really agree. Yes Looker corrects some of the data this way.

But the following scenario still leads to problems.
model:
from order
left join order_items (one order can have many order items)

user:
user selects a dimension from order item (lower granularity) and adds a measure from orders. (higher granularity)
Now the sum of all order counts in the table is not equal to unique amount of orders. Users mostly spot this if they add a table total, and then the total (which calculates it ‘correctly’) deviates from summing the individual values.

This means a user still has to know what it is doing / needs to know the table is a one_to_many join. Or explore makers need to avoid adding any measures on the orders table. And this is exactly like the ‘fanout’ issue as mentioned above.

For anyone else running into the CTE problem.
Heres the FR
https://portal.feedback.us.pendo.io/app/#/case/427728?currentProductId=53ee6715-9dcb-471f-bd41-7846ac4d9e41