https://questdb.io logo
Title
s

Super Richman

05/12/2023, 9:13 AM
2023-05-12T09:12:00.131416Z E i.q.s.BytecodeAssembler Too much input to generate io.questdb.griffin.engine.groupby.GroupByFunctionsUpdater. Bytecode is too long
j

Jaromir Hamala

05/12/2023, 11:31 AM
hi, do you have a reproducer?
s

Super Richman

05/12/2023, 11:40 AM
mmm in cloud? no but I can try to make.. I opened ticket here
j

Jaromir Hamala

05/12/2023, 11:55 AM
ok, I see. 6000 selected columns explains it: questdb generates java bytecode to speed up the query, but there is a limit on max. size of the bytecode. with 6000 columns you exceed this limit.
s

Super Richman

05/12/2023, 12:01 PM
😞 6000 is just the beginning 😉 can it be increased somehow?
j

Jaromir Hamala

05/12/2023, 12:07 PM
in theory - yes. but it would be more complicated than just increasing a constant. as the the limit is inherited from the JVM platform. chances are there is a better solution for what you are trying to achieve. I understand you treat AVG as some sort of a checksum. are these 6000 columns really physical? or it’s a result of many many nested LT JOINs?
s

Super Richman

05/12/2023, 12:07 PM
Is this limit set by number of characters in the select query? what is the limit exactly? I will try to construct a query that don't exceed the limit
they are results of calculations such as (col1+col2)
j

Jaromir Hamala

05/12/2023, 12:12 PM
no, the limit is not directly related to SQL text length. each grouped column contributes some amount of bytes to the generated bytecode size. by halving intervals you could try to find how many column you can group in one go. I have you say you are really good in stressing questdb and coming up with unusual testing scenarios:))
s

Super Richman

05/12/2023, 12:18 PM
I know ;] you guys should start paying me 😉
Since QDB is uber fast I am moving any logic that I can to it
I would love to know how you made it so fast
j

Jaromir Hamala

05/12/2023, 12:25 PM
one consideration:
sum()
(and hence
avg()
) on floating point numbers is not guaranteed to be associative. in other words:
(1.0 + 1.1) + 1.2
does not necessary yield precicely the same result as
1.0 + (1.1 + 1.2)
as anyone with basic math knowledge could expect:) (well, it does in this case, but you get the point). this is not QuestDB-specific, it’s due to how floating points numbers are represented internally inside a computer. this may or may not be an issue for your use-case.
s

Super Richman

05/12/2023, 12:26 PM
hmm as long as its doing it consistently across all calculations i think its ok
you think I can use stddev_samp?
j

Jaromir Hamala

05/12/2023, 1:01 PM
to calculate “column similarity”? what would be the advantage compared to
avg()
?
s

Super Richman

05/12/2023, 1:11 PM
that it also captures some of the differentially within the results of each column?
such as 1,5,5,5,5,5,5,5,9 and 1,2,3,4,5,6,7,8,9 avg = same result, stddev = different result. no?
j

Jaromir Hamala

05/12/2023, 1:15 PM
yes. but you can find another example where stddev will match while average will be different.
s

Super Richman

05/12/2023, 1:20 PM
yeah right 😕 there is no hash function?
j

Jaromir Hamala

05/12/2023, 1:59 PM
nope. feel free to open a feature request, that could be a nice addition!