Hi @asurajpai,
Q1> Do you need to run Purge script to be executed on both master and slave?
Answer: No - Only running on master is sufficient; assuming replication is working fine between master and slave.
Q2> Will clean up of childfactables retrieve disk space?
Answer: Yes - Cleanup is essentially dropping the old childfactables, which would release space.
Q3> What are childfactables?
Answer: Childfactables are daily-partitioned fact data. Every day new partitions are created and data gets ingested into the daily partitioned tables. So at a later point in time, when the old fact data will not be required, we can purge the respective childfactables. The listing of childfactables against the respective dates [in epoch] is present in “analytics.childfactables”.
Q4> Purging not releasing space - Issue?
Answer: We had a bug in our code, where data got inserted into the main fact table instead of respective child-fact table [because of a timezone bug]. In that case out 100 records in a day, a high percentage of data could have landed in main fact table instead of child-fact. So after purge - very less amount of disk space is released. The bug had been resolved in future OPDK release. Kindly confirm which version you are using. @sukruth / @Sanjoy Bose can confirm the version in which the fix was made.
Q5> How is data stored?
Answer: Explained earlier.
To fix your current solution: you need to DELETE data from analytics.“..fact” for the unwanted period [where client_received_start_timestamp <= now() - interval ‘X days’]. Then you need to run VACUUM FULL. These operations will take a considerable amount of time and is a function of how much data needs to be cleared. There are other alternatives which are slightly faster - but still will take time.
Q6> Job to Purge
Answer: We have an updated version of purging script which deletes records from fact table apart from dropping the child-fact tables. But don’t run it on the first go before attempting the above recommended fix, as it can run for a long time. @sukruth / @Sanjoy Bose can give details of the script and the location.
Q7> Master-Slave replication type
Answer: Stream WAL records
Q8> Documentation on DB Maintenance
Answer: We recommend keeping the default PG settings. In case the aggregate tables have grown considerably in size, we may need to reindex them. We usually do not recommend VACUUM on main fact table - as the child-fact tables ideally should have almost all records and dropping them should be sufficient. In your case you would require to DELETE and VACUUM FULL on the main fact table. @sukruth / @Sanjoy Bose can give details if there is any such documentation.