Sort by Pivots, Rank with Other Bucket, and Percentile Tail Analysis

The Problem

A common issue users run into while modeling is sorting by pivots. For example, I might want to pivot my top 10 brands by total sales.

The most elegant solution I’ve found happens to work well for several other use cases. In this article I’ll show how we can satisfy each in turn, by progressively adding to an ephemeral derived table.

The Solution

In this example we’ll use a simple e-commerce model. Here I’ve pulled my top brands by revenue:

Now, if I’d like to see how these top 10 perform over time, a simple pivot won’t help. It will sort alphanumerically by default:

I would have to filter my brand by each of the brand names that were in the top 10. This can work for a one-off analysis, but won’t scale effectively for my team.

Sort by Pivots

To accomplish this, we can have Looker rank our brands for us, using an ephemeral derived table.

The idea is to dynamically calculate statistics at a brand level and then join them back into our model.

The first step is to create the query we’ll use for the ranking. I’ll often write these by hand, but a quick shortcut is to assemble the query in an Explore, then punch it out to SQL Runner, make any needed modifications, and then bring it into my project as a derived table:

SELECT
    products.brand AS "brand",
    COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue",   # Included for clarity
    RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK"
FROM
    public.order_items AS order_items
    LEFT JOIN public.orders AS orders ON ( orders.id = order_items.order_id)
    LEFT JOIN public.inventory_items AS inventory_items ON ( order_items.inventory_item_id = inventory_items.id )
    LEFT JOIN public.products AS products ON ( inventory_items.product_id = products.id )
WHERE
    1=1
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20

The query above groups by the dimension by which we wanted to pivot/sort, and establishes a rank function on total revenue. Now, we can bring it into our model as a derived table. You can run this in SQL Runner and then have Looker automatically generate the LookML.

Looker prompted me for a view name and I called our new view brand_rank_by_sales:

view: brand_rank_by_sales {
    derived_table: {
        sql:
        SELECT
         products.brand AS "brand",
         COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue",
         RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK"
        FROM
         public.order_items AS order_items
         LEFT JOIN public.orders AS orders ON orders.id = order_items.order_id
         LEFT JOIN public.inventory_items AS inventory_items ON order_items.inventory_item_id = inventory_items.id
         LEFT JOIN public.products AS products ON inventory_items.product_id = products.id
         WHERE
         1=1

        # This is a "templated filter" - any filter I place on my orders created date will be mirrored here

         AND {% condition orders.created_date %} orders.created_at {% endcondition %}
         GROUP BY 1
         ORDER BY 2 DESC
        ;;
        }

    measure: count {
        type: count
        drill_fields: [detail*]
    }

    dimension: brand {
        type: string
        sql: ${TABLE}.brand ;;
    }

    dimension: total_revenue {
        type: number
        sql: ${TABLE}.total_revenue ;;
    }

    dimension: rnk {
        ype: number
        sql: ${TABLE}.rnk ;;
    }

    set: detail {
        fields: [brand, total_revenue, rnk]
    }
}

Now, we can join it into the rest of our e-commerce Explore, as a many-to-one relationship joined on brand:

explore: order_items {
    join: orders {
        relationship: many_to_one
        sql_on: ${orders.id} = ${order_items.order_id} ;;
    }

    join: users {
        relationship: many_to_one
        sql_on: ${users.id} = ${orders.user_id} ;;
    }

    join: inventory_items {
        type: left_outer
        sql_on: ${order_items.inventory_item_id} = ${inventory_items.id} ;;
        relationship: one_to_one
    }

    join: products {
        type: left_outer
        sql_on: ${inventory_items.product_id} = ${products.id} ;;
        relationship: many_to_one
    }

    join: brand_rank_by_sales {
        type: left_outer
        sql_on: ${products.brand} = ${brand_rank_by_sales.brand} ;;
        relationship: many_to_one
    }
}

Now, we have the tools we need to look at the performance of our top 10 brands’ revenue over time.

I can now pivot by my brand and simply filter the brand_rank_by_sales.rnk field, on the condition of less than or equal to 10.

To allow clear sorting by our pivot field, we can combine the rank and the brand name by adding a new dimension that will support being sorted alphanumerically. The CASE WHEN statement will simply prepend a 0 to ranks less than 10, for proper sorting.

dimension: ranked_brand {
    type: string
    sql:
        CASE
        WHEN
            ${rnk} < 10 THEN '0'|| ${rnk} || ') ' || ${brand}
            ELSE ${rnk} || ') ' || ${brand}
        END ;;
}

Now, we can pivot by our brand_rank_by_sales.ranked_brand field and filter for the top 10 rank:

Group Our Long Tail into a Single “Other” Bucket

Adding on to the pattern above, I may want to take everything below a certain rank and place it into a long tail or “other” bucket. This can be used for time-series analysis, but is especially helpful in pie charts. You can find more information on creating “other” buckets using table calculationsor using charts.

I’ll simply add the following two fields:

filter: other_bucket_threshold {
    type: number
}

dimension: ranked_brand_with_tail {
    type: string
    sql:
     CASE
        WHEN {% condition other_bucket_threshold %} ${rnk} {% endcondition %}
        THEN ${ranked_brand}
        ELSE 'x) Other'
    END ;;
}

The benefit of this approach is that the end-user can dynamically adjust the tail threshold, which may vary or need to be tuned for the subset under consideration.

Drill into Your Outliers Using Percentile Rank

Add the following field to your derived table:

,PERCENT_RANK () OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0)) AS "percent_rank"

And the following dimensions to the view:

dimension: percent_rank {
    type: number
    value_format_name: percent_2
    sql: ${TABLE}.percent_rank ;;
}

dimension: percentile_tiers {
    type: tier
    tiers: [0.25,0.5,0.75,0.9,0.95]
    sql: ${percent_rank} ;;
}

Now, we can drill into our outliers:

1 Like