Hello,
I’m trying to create a table which stores all the events from two other table with some aggregations for the past 120 days. I tried to solve the problem with materialized view, but it seems I hit some limitations.
Because of the data volume - each query will be 60-70 TB, I’m trying to figure out how to optimize the solution.
What I’ve done so far:
- I created a view - ‘events_view’
CREATE VIEW events_view AS
SELECT
colA,
colB,
date_column,
count(*)
FROM table a
WHERE date_column >= current_date - 120
GROUP BY colA,
colB,
date_column
UNION ALL
SELECT
colA,
colB,
date_column,
count(*)
FROM table b
WHERE date_column >= current_date - 120
GROUP BY colA,
colB,
date_column;
- Then I’m using the view to create a table ‘events_table’
CREATE TABLE 'events_table' AS
PARTITION BY date_column
SELECT * FROM 'events_view';
If I write a procedure where I add the following:
- Delete data older than 120 days.
- Insert data which is from current date.
DELETE * FROM 'events_table'
WHERE date_column < current_date - 120;
INSERT INTO 'events_table'
SELECT * FROM 'events_view' WHERE date_column = current_date;
Will this solve my issue or there is a far more elegant way and optimal way to do that. And my other concern is that the view will run all this data for past 120 days every time even if there is a WHERE date_column = current_date clause.