Alex Lourie05/21/2023, 10:56 AM
, the ID is an indexed symbol and TIMESTAMP is a dedicated timestamp. The data is partitioned by DAY. We use a system with 128G of memory and 16 vCPUs, with 10 workers. What can we do to improve the performance significantly?
SELECT column1, column2 WHERE id=X ORDER BY TIMESTAMP
Gabriel Mocan05/21/2023, 11:20 AM
If you’re only interested on latest values, check out
SELECT column1, column2 WHERE id=X SAMPLE BY 30m
Alex Lourie05/21/2023, 11:23 AM
provide any significant benefits compared to not using it at all?
Bolek Ziobrowski05/22/2023, 7:27 AM
to have a visible impact because the number of rows to sort is small . I think you can try the following : • switch to device that's better at random IOs • use a compressed filesystem to fit more data in ram • manually shard the table by id (e.g. id %100) and remove the index
Alex Lourie05/22/2023, 7:30 AM
Bolek Ziobrowski05/22/2023, 7:48 AM
Alex Lourie05/22/2023, 8:02 AM
Bolek Ziobrowski05/22/2023, 8:24 AM
Alex Lourie05/22/2023, 8:24 AM
Bolek Ziobrowski05/22/2023, 8:26 AM
Alex Lourie05/22/2023, 8:26 AM
Bolek Ziobrowski05/22/2023, 8:30 AM
Alex Lourie05/22/2023, 8:31 AM
Bolek Ziobrowski05/22/2023, 9:04 AM
Fetching data from old table would be much faster because IOs wouldn't really be random but sequential . Fetching data from new table would be random but the scope would be much smaller. Every month you'd move data from data_latest to data_old ordered by device_id. This way for old data you'd get sequential reads and way less random reads for new data as it would be stored in 30 days * 48 = 1440 row batches. This approach assumes the query is the most important in the workload and you don't need timeseries optimisations/partitioning for data_old .
select * from ( SELECT column1, column2 FROM data_old WHERE id=X union all SELECT column1, column2 FROM data_latest WHERE id=X ) order by timestamp
Alex Lourie05/22/2023, 12:41 PM