Title
#users-public
s

Sam Shiles

10/26/2022, 6:38 AM
I'm seeing really slow queries in questdb vs similar sized RDS mysql instance. I mentioned this the other day but have only just found the time to put together some examples as @Pei requested. https://github.com/Recodify/quest-db-performance/blob/master/SampleByQueries.md
6:39 AM
NB I'm using 6.4.2 due to this issue: https://github.com/questdb/questdb/pull/2683
Bolek Ziobrowski

Bolek Ziobrowski

10/26/2022, 8:00 AM
Hi Sam. It'd be good to migrate to latest 6.5.4 because it includes a number of performance and stability improvements . I think results might be affected by : • questdb having only 8GB (vs 16GB for mysql) • cursors keeping memory while in query cache and web console potentially using multiple cursors . Latest qdb version retains very little memory for cursors kept in query cache . Web console limits results to first 1k rows but counts the remaining ones . Count time metric shows how long it took to iterate over those rows . Could you also run the following ?
SELECT
    count(*)
FROM
    device d 
    INNER JOIN property p ON p.id = d.propertyid
WHERE
    AND p.reference = 'TestProperty123'
SELECT
    readingTypeId , count(*)
FROM
    reading r
WHERE
    readingDate BETWEEN '2022-01-01 00:00:00.000' AND '2022-10-25 04:59:59.000'
GROUP BY readingTypeId
s

Sam Shiles

10/26/2022, 8:16 AM
Hi Bolek! Apologies, that was my mistake, I thought xlarge would have the same cpu and ram, I'll bump that machine up to match the rds box. • first query: 1 row in 213ms Execute: 122.73msNetwork: 90.27msTotal: 213ms Count: 0Compile: 32.73ms • second query: 30 rows in 6.55s Execute: 6.51sNetwork: 36.67msTotal: 6.55s Count: 0Compile: 9.25ms Is it possible to update the questdb binaries whilst retaining the same data dirs? If so, I can try that now, otherwise I'll have to wait for that fix so I can reimport
Bolek Ziobrowski

Bolek Ziobrowski

10/26/2022, 8:18 AM
I was asking about query result actually , not response time .
8:19 AM
Yes, it's possible . It might trigger data/metadata migration on startup though but I don't think it matters in this case .
s

Sam Shiles

10/26/2022, 8:20 AM
ah ok
8:20 AM
first query is 18
8:20 AM
Bolek Ziobrowski

Bolek Ziobrowski

10/26/2022, 8:49 AM
Could you copy results as text ?
8:51 AM
Could you also check this query ?
SELECT count(*)
FROM
    reading r
    INNER JOIN device d ON d.id = r.deviceid
    INNER JOIN property p ON p.id = d.propertyid
WHERE
    readingDate BETWEEN '2022-01-01 00:00:00.000' AND '2022-10-25 04:59:59.000'
    AND r.readingTypeId = 'environment.temperature.indoor'
    AND p.reference = 'TestProperty123';
s

Sam Shiles

10/26/2022, 8:54 AM
8:57 AM
third query: 178081
8:58 AM
66.84s
Bolek Ziobrowski

Bolek Ziobrowski

10/26/2022, 9:18 AM
ok, thanks .
Alex Pelagenko

Alex Pelagenko

10/26/2022, 9:40 AM
I read though the queries, it seems that you don’t need any joins all the data in SELECT part comes from `
reading
table
9:40 AM
did you try queries without joins?
s

Sam Shiles

10/26/2022, 9:42 AM
Hi @Alex Pelagenko some of the queries don't require joins, some do. Example without and with joins are both covered in the repo I linked
Alex Pelagenko

Alex Pelagenko

10/26/2022, 9:42 AM
SELECT
    readingDate,
    AVG(r.value),
    r.deviceId
FROM
    reading r
    INNER JOIN device d ON d.id = r.deviceid
    INNER JOIN property p ON p.id = d.propertyid
WHERE
    readingDate BETWEEN '2022-01-01 00:00:00.000' AND '2022-10-25 04:59:59.000'
    AND r.readingTypeId = 'environment.temperature.indoor'
    AND p.reference = 'TestProperty123'
    SAMPLE BY 1d;
9:43 AM
ah, join is for filtering
s

Sam Shiles

10/26/2022, 9:44 AM
yep
Alex Pelagenko

Alex Pelagenko

10/26/2022, 9:45 AM
this can be deleted, cannot it?
INNER JOIN device d ON d.id = r.deviceid
9:45 AM
sorry
s

Sam Shiles

10/26/2022, 9:45 AM
on some of the queries, yes, others filter on serialnumber
Alex Pelagenko

Alex Pelagenko

10/26/2022, 9:58 AM
could you try this query instead?
SELECT
    readingDate,
    AVG(r.value),
    r.deviceId
FROM
    reading r
WHERE
    readingDate BETWEEN '2022-01-01 00:00:00.000' AND '2022-10-25 04:59:59.000'
    AND r.readingTypeId = 'environment.temperature.indoor'
    AND r.deviceid in (
      select d.deviceid from device d join property p ON p.id = d.propertyid WHERE p.reference = 'TestProperty123'
    )
    SAMPLE BY 1d;
s

Sam Shiles

10/26/2022, 10:02 AM
subquery should be
select d.id
10:02 AM
even with that fixed though I'm getting an error: constant expected
10:02 AM
sql looks good to me though
Bolek Ziobrowski

Bolek Ziobrowski

10/26/2022, 10:10 AM
That's because qdb doesn't support "IN (subquery)" yet .
Alex Pelagenko

Alex Pelagenko

10/26/2022, 10:18 AM
IDs as strings looks like a killer, join on strings are not efficient
10:18 AM
what’s the time range on the data you compare?
s

Sam Shiles

10/26/2022, 10:20 AM
Do you mean in general, as a product/business?
Alex Pelagenko

Alex Pelagenko

10/26/2022, 10:21 AM
on the test data
10:22 AM
is it same as query data e.g. ‘2022-01-01 00:00:00.000’ AND ‘2022-10-25 04:59:59.000’
s

Sam Shiles

10/26/2022, 10:24 AM
Not sure I follow. I use two time ranges in the test queries. YTD, as above. and MTD 2022-10-01 to 22-10-25. The data is a copy from a production db. In our existing product we allow various "canned" ranges. This month, last month, today, this year etc
Alex Pelagenko

Alex Pelagenko

10/26/2022, 10:26 AM
I mean is this number of reading
444258950
comes from YTD or is it across multiple years?
s

Sam Shiles

10/26/2022, 10:29 AM
ah, sorry. with you now. 2020-01 to now but with a weighting towards now. One moment. I'll get the distribution data stats for you
Alex Pelagenko

Alex Pelagenko

10/26/2022, 10:29 AM
nw, that’s enough
s

Sam Shiles

10/26/2022, 10:32 AM
had already done it 🙂
Alex Pelagenko

Alex Pelagenko

10/26/2022, 10:34 AM
Your device ids look like UUIDs. We are planning to add 128bit UUID type, that will help the queries and storage.
10:34 AM
It’s also very good example of how indexes can help, thank you for putting it down
s

Sam Shiles

10/26/2022, 10:41 AM
yeah, that UUID actually has real world meaning as it comes from the device however I could always store that "off-reading". If I were to switch the "PK" of the device to another type, the most "natural" choice would be the serialnumber, which is actually an 8 digit hex. I could use that raw as a string hex. Or convert it to a decimal and have an integer Id. Do you think making that switch would help perf and if so, I'm guessing the decimal int would be preferable? Happy to try it out if that helps
Vlad

Vlad

10/26/2022, 10:49 AM
it feels to me that questdb optimiser could do a better job here, those joins are one-to-one with IDs, right ?
s

Sam Shiles

10/26/2022, 10:50 AM
many to one
10:50 AM
device has many readings
10:51 AM
but yes, const refs
Vlad

Vlad

10/26/2022, 10:51 AM
oh yeah, that's what i meant
s

Sam Shiles

10/26/2022, 10:51 AM
I thouhht it was 😃
Vlad

Vlad

10/26/2022, 10:52 AM
could you try removing joins entirely to gauge raw sample by speed ? e.g. provide device ids via
=
or
or
?
s

Sam Shiles

10/26/2022, 10:52 AM
done that
10:52 AM
Last example on the readme in the repo
Vlad

Vlad

10/26/2022, 10:54 AM
perfect, this makes sense
s

Sam Shiles

10/26/2022, 10:55 AM
I've actually just updated that with stats from 6.5.4, which are about the same a 6.4.2 part from the top one (YTD) which takes twice as long and returns more rows when running in docker vs metal pointing to the same datadir. I triple checked that as all the other queries were about the same whether 6.5.4, 6.4.2 docker or metal
10:57 AM
With the no joins query, I'm just about to bump the RAM from 8 to 16 so it's properly like for like with the mysql RDS box. I'll let you know the results
Vlad

Vlad

10/26/2022, 10:59 AM
we'll change optimiser to produce better plan for these joins
s

Sam Shiles

10/26/2022, 11:03 AM
Amazing stuff. Let me know if you want any assistance building out an IOT showcase. Currently quest seems to be crypto/markets focused but I guess tsdb's three big areas are crypto, metrics and IOT. We have almost 200k IOT devices deployed in the wild and hope to have many more soon
Alex Pelagenko

Alex Pelagenko

10/26/2022, 11:41 AM
The part with different record count is very interesting.
11:42 AM
So top query in 6.5.4 docker returns different number than 6.5.4 binaries build?
s

Sam Shiles

10/26/2022, 11:43 AM
yep! Hard to believe I know
Alex Pelagenko

Alex Pelagenko

10/26/2022, 11:44 AM
do you use
questdb-6.5.4-rt-linux-amd64.tar.gz
binary?
s

Sam Shiles

10/26/2022, 11:45 AM
I've not got a binary a for 6.5.4
11:45 AM
6.4.2 binary and docker and 6.5.4 docker only
11:45 AM
So top query in 6.4.2 docker returns different number than 6.4.2 binaries build?
11:46 AM
binary for 6.4.2 was from the aws ami: questdb-6.4.2-no-jre-bin.tar.gz
Alex Pelagenko

Alex Pelagenko

10/26/2022, 11:49 AM
There were few bugs fixed since 6.4.2 which could result in incorrect filtering
11:52 AM
do you mind updating binaries to 6.5.4 and check the count is same as docker please? here are the steps to update ami
sudo -i
export Q_VERSION="6.5.4" 
wget <https://github.com/questdb/questdb/releases/download/${Q_VERSION}/questdb-${Q_VERSION}-no-jre-bin.tar.gz>
tar xzvf questdb-${Q_VERSION}-no-jre-bin.tar.gz
cp questdb-${Q_VERSION}-no-jre-bin/questdb.jar /usr/local/bin/questdb.jar
cp questdb-${Q_VERSION}-no-jre-bin/questdb.jar /usr/local/bin/questdb-${Q_VERSION}.jar
systemctl restart questdb.service
systemctl status questdb.service
Pei

Pei

10/26/2022, 12:15 PM
Amazing stuff. Let me know if you want any assistance building out an IOT showcase. Currently quest seems to be crypto/markets focused but I guess tsdb’s three big areas are crypto, metrics and IOT. We have almost 200k IOT devices deployed in the wild and hope to have many more soon
hey Sam, we’d love that. Perhaps we could chat about it? 🙂
s

Sam Shiles

10/26/2022, 12:49 PM
You're correct. The faster 6.4.2 metal install is wrong. The two slower docker ones have the correct row count. I'll try doing that update a bit later
12:49 PM
I guess perhaps that AMI is a bit broken?
1:07 PM
@Vlad Here is the simplest possible repro of the problem in case it helps with the optimiser change. https://github.com/Recodify/quest-db-performance/blob/master/SimpleMatchedRam.md
Alex Pelagenko

Alex Pelagenko

10/26/2022, 1:44 PM
Do you fetch whole result set in MySql execution of just top X rows?
s

Sam Shiles

10/26/2022, 1:44 PM
full result
Alex Pelagenko

Alex Pelagenko

10/26/2022, 1:44 PM
these Fetch time look faster than speed of light
s

Sam Shiles

10/26/2022, 1:45 PM
Would you like to see a query plan?
Alex Pelagenko

Alex Pelagenko

10/26/2022, 1:46 PM
no, just curios what can be transferred in 14 micros
s

Sam Shiles

10/26/2022, 1:47 PM
ah, that fetch number probably doesn't mean much. I'm not sure how workbench calculates it and it might not be that reliable
1:47 PM
The duration seems to include some of the network time as it's slightly shorter if you run via mysql cli
Vlad

Vlad

10/27/2022, 11:02 PM
thank you Sam 👍 I’ll try to hassle up and find time to reproduce and fix the performance issues. Next week most likely.