Problem
We have star model explores (start with fact) and “configuration” model explores (start with dimension)
The reason is that you want to see all configured e.g. suppliers even if they have 0 rows in the fact. Users then want to use this explore to always see all suppliers, regardless if they had 0 orders in a selected period.
E.g. from: supplier
left join: orders
So you want to filter on orders join specifically, e.g. only orders last x days.
Right now, we need to make a derived table with templated filters to do this.
Solution
But the solution seems very easy:
add a boolean field to the view join that says “include as CTE” where filters applied to its dimensions are added inside the CTE instead of at the bottom of the entire SQL query.
Any other ideas?
Ofc in each star model, I could add to each fact table an empty row for each dimension we join, but that seems messy too.
But then if someone filters on a dimension of orders. The suppliers without orders will dissappear. And we specifically want to keep seeing all suppliers and have a 0 for orders.
Then if you select supplier_id and num_orders your SQL will turn out:
select
suppliers.id,
sum(orders_per_supplier_paid_period.num_orders)
FROM suppliers
LEFT JOIN (
select
supplier_id,
coalesce(count(distinct order_id),0) AS num_orders
from orders
where {% condition filter_paid_period %} paid_at {% endcondition %}
group by 1
) as orders_per_supplier_paid_period on orders_per_supplier_paid_period.supplier_id = suppliers.id
It’s great as you can filter on the paid period of the orders, but you still see all suppliers because the derived table gets wrapped in a CTE ()/subquery.
But for each dimension you want to add, you need to add a conditional filter in the derived table—a lot of work.
What would just be as easy is:
explore: suppliers
from: suppliers
Edit: I see an issue with this approach as what do you do if someone selects dimensions from the join. I think still you would want to allow this behaviour, which does lead to a fanout. But If you don’t want that then you shouldn’t allow dimensions in the join.
This is also why i said it can be one_to_one one_to_many, depending on how many dimensions you let people select from the join view.
Nope nothing yet. We now just rely on the ‘limited’ liquid derived table joins.
You could do a merge result.
From a explore with all suppliers, and then merge in some order information already aggregated to supplier_id. But merge results are tough to work with.
Hey folks, thanks for the post, checking if I understand correctly, you want the where condition to not be at the end of the query? Wouldn’t adding it to the join solve this?
We can do this by creating a lookml filter field and passing that through in your sql_on join:
Yeah but if I want to filter my orders by more than just 1 dimension this is not scalable. You end up with something similar to my derived table where you have to define this for each dimension, rather than it working automatically.
The proposed request to add the parameter include_as_cte: yes would be the best.
Would be nice to not have to write a full screen worth of code as yet another workaround.
The issue with include_as_cte: yes would be that it’s going to cause fanouts as you would have no way of controlling the granularity of the CTE i.e. users dimensions would be added on the fly based on their selections. I think the only way it would work is if you only allowed dimensions in the join to be added to the CTE.
You can raise a FR from within Looker (Question mark top right hand corner and then product idea), this would be the best course of action and then following up with your account representative.
Yes true, it just means that an explore user has to know that adding a dimension leads to a fanout… but they need to know this in any case when you make an explore that has a one_to_many join. That’s not what this feature request would solve, but it would solve the filtering aspect of it.
The one_to_many issue should be solved with more feedback towards users on what they are doing. Or as we try, to avoid one_to_many all together, but for some questions its unavoidable.
Conceptually I don’t really agree. Yes Looker corrects some of the data this way.
But the following scenario still leads to problems.
model:
from order
left join order_items (one order can have many order items)
user:
user selects a dimension from orderitem (lower granularity) and adds a measure from orders. (higher granularity)
Now the sum of all order counts in the table is not equal to unique amount of orders. Users mostly spot this if they add a table total, and then the total (which calculates it ‘correctly’) deviates from summing the individual values.
This means a user still has to know what it is doing / needs to know the table is a one_to_many join. Or explore makers need to avoid adding any measures on the orders table. And this is exactly like the ‘fanout’ issue as mentioned above.