Title
#users-public
Bolek Ziobrowski

Bolek Ziobrowski

10/05/2022, 1:35 PM
Hi . Most files are allocated in and expanded by fixed-size chunks .
x

xrust01

10/05/2022, 1:39 PM
any chance how to shrink the size then? As one day day takes 11gb, so if we want one year thats 3tb of data..
j

javier ramirez

10/05/2022, 2:20 PM
I am assuming where it makes sense you are using Symbol columns rather than Strings, to save a few bytes per record
2:21 PM
Other than that, reducing size right now boils down to storing less data
x

xrust01

10/05/2022, 2:21 PM
well, it seems that even boolean is the same size as string so symbol would be the same?
2:22 PM
and also given that no matter how many rows are there the size on the disk is the same..?
j

javier ramirez

10/05/2022, 2:23 PM
as Bolek explain, files are allocated/expanded in chunks, but when storing a column a Symbol takes only the storage of an int, vs storing a String
2:24 PM
For saving storage on historical data, If you don’t need to query the whole year, you can always DETACH partitions and keep the data elsewhere more cost efficient, like a cloud object storage or your offline storage
2:27 PM
If you need to run statistical queries over historical data, but you don’t really need the fine grained resolution, you could always have a historical table where you stored downsampled data. Let’s say for example you are getting values every second, but for historical queries you are OK with 5 minutes resolution. You could save 300x storage by doing something like
SELECT symbol, avg(col1), avg(col2) from main_table where timestamp > dateadd('d', -30, now()) sample by 5m
2:28 PM
you could insert the result of that in the sampled table and then detach the same partitions from the main table
Bolek Ziobrowski

Bolek Ziobrowski

10/05/2022, 2:44 PM
and also given that no matter how many rows are there the size on the disk is the same..? If column data takes more than 16MB then qdb will allocate another 16MB ... I think you can influence these sizes via :
cairo.writer.data.append.page.size
cairo.writer.misc.append.page.size
but it doesn't make sense to use day partitioning unit if there's going to be so little data per day . If 1 day takes 11GB using 16MB default then table has around 700 columns . If of this space is wasted then I'd recommend partitioning by month . Wasted disk space is one thing but you've also to consider the number of files : 700*365 == 256k files !
x

xrust01

10/05/2022, 2:57 PM
Ok thank you very much, we will run some investigation on how big the data really are and according to this we will set the size