Title
#users-public
e

Elina Kuldkepp

10/27/2022, 12:45 PM
Hello. We have a lot of IoT data flowing in and some of our developers are concerned about performance and want to do some pre-emptive measures. Do you always recommend to use avg, min, max etc on the DB directly or should we in some cases pre-calculate (ie store in the questdb database) respective function results? Is there some kind of row count from which you recommend pre-calculating? Is there also recommended way how to prune data? Or could this be improved with some DB optimizations. For now we take avg for about 15 miljon rows in 600ms, but we expect this to icrease by a lot.
Alex Pelagenko

Alex Pelagenko

10/27/2022, 1:20 PM
Hi. Pre-calculation is a good idea if you don’t need raw data. There is no hard limit, but once you data set does not fit RAM you start to be bound by disk speed and performance can drop significantly.
1:23 PM
What’s your anticipated “hot” data size? E.g. row count, column count you want to query with low latency
e

Elina Kuldkepp

10/27/2022, 1:34 PM
Mostly I need to fetch either 1 or 2 columns, but row count could grow really fast. Currently I have 700 sensors that are reporting every 5 to 15 minutes.
1:36 PM
But I mean If I should pre-calculate average, min, max, count, last, first etc for each 700+ sensor it would be also quite messy? So I would have 700 times min, max, avg etc
Bolek Ziobrowski

Bolek Ziobrowski

10/27/2022, 2:12 PM
create table aggtest as (select rnd_double() a, rnd_double() b from long_sequence(100000000) );

select avg(a), min(a), max(a), avg(b), min(b), max(b)  from aggtest ;
takes : • cold - 1.1s • hot - 0.16s on my desktop (amd 12c, 64gb, nvme ssd) . If you add predicates on designated timestamp then qdb will only scan the time range . If you'd like more concrete advice then please share schema, query and data stats .