https://questdb.io logo
Title
m

Matthew Halko

02/23/2023, 9:51 PM
Is questdb designed to consume all memory on the host machine?
i

Imre

02/24/2023, 10:12 AM
Hi @Matthew Halko, QuestDB is designed to map as many files into memory as it has to in order to carry out all ingestion and queries. Could you, please, share a bit more detail about your use case/architecture?
How many tables do you have? How much data is stored in the database? How are your tables partitioned? Do you use symbol fields with high cardinality? What kind of queries do you run? Do you run a lot of different queries? How are you ingesting data into the DB? What triggers the memory issue? Already consumes all memory at startup or slowly increasing? Is it related to queries or ingestion?
Knowing more about your setup will help us to answer your question. Quick tips: • symbol fields with high cardinality -> disabling symbol cache can help • lot of different queries -> query cache can be disabled • opening a lot of tables/partitions at the same time -> maybe you can use more granular partitioning
Please, share a bit more details
m

Matthew Halko

02/24/2023, 3:03 PM
I have a very basic use case. I am using questdb to store time and sales data in a single table with only 1 symbol. When I perform queries on questdb the memory usage keeps growing. I would like to limit the amount of memory that can be consumed, simple as that
I run basically the same query over and over again to get trades within a timespan
i

Imre

02/24/2023, 4:39 PM
tested this quickly. i have been running a query in a loop for about 20 mins (python via PG wire connection). this is the query i was running:
SELECT symbol, sum(amount) as volume, min(price) as minimum, max(price) as maximum, round((max(price)+min(price))/2, 2) as mid, timestamp as ts FROM trades sample by 1m ALIGN to CALENDAR
i have 1.3 million rows in the table, the query returns with 4320 rows. mem usage seems to be constant. using jconsole and htop. java heap mem usage is 200-800 MB. htop shows the server uses 1.2-1.8 GB, moves together with java heap mem. when i press ‘Perform GC’ in jconsole the mem usage drops to 1.2 GB.
do you run many different queries? you could try to disable query cache.
m

Matthew Halko

02/24/2023, 4:41 PM
I ended up changing cairo.idle.check.interval to check more often (every 10s)
My queries were much larger, a couple hundred thousand rows per query were being returned
i

Imre

02/24/2023, 4:46 PM
ok, that will release inactive readers
m

Matthew Halko

02/24/2023, 4:46 PM
yep, it then releases the memory mapped files
i

Imre

02/24/2023, 4:47 PM
are you running many queries parallel?
those readers should be reused when they are inactive instead of getting closed every 10 seconds
m

Matthew Halko

02/24/2023, 4:48 PM
I am not hitting the database very often
i

Imre

02/24/2023, 4:48 PM
the server opens new reader only if the pool is empty or all readers are used by a query which is currently running
we will have to test this and fix it if inactive readers are not reused
can i ask which version are you using? is it the latest?
m

Matthew Halko

02/24/2023, 4:50 PM
Yeah, I am on latest.
i

Imre

02/24/2023, 4:50 PM
thank you, will check
m

Matthew Halko

02/24/2023, 4:51 PM
I have a timeseries of trades, and i was looping through each data and getting all trades for that day
each day*
It makes sense that it would open new readers right? Because each query was touching new dates
i

Imre

02/24/2023, 4:53 PM
not really, if your first query finished, the reader used by the query is returned to a pool and becomes inactive (meaning if the next query comes for the same table, it can reuse it).
the pool will close inactive readers after some time (the setting you changed)
so now instead of reusing these readers you are just closing them and creating new ones. it works but much more efficient to reuse them.
let me take a look…
m

Matthew Halko

02/24/2023, 4:57 PM
ahh ic. I am opening and closing the pgsql connection on every query
maybe that is what is going on
i

Imre

02/24/2023, 4:57 PM
it is more efficient to keep the connection open and just run the queries but that has nothing to do with the pooling of readers
the pool is global, not per connection
will do the same in my test, will connect. run the query, then close the connection and repeat
seems to be working:
2023-02-24T17:43:37.609078Z A server-main enjoy
2023-02-24T17:43:44.345190Z I pg-server connected [ip=127.0.0.1, fd=128]
2023-02-24T17:43:44.346485Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2023-02-24T17:43:44.366186Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET extra_float_digits = 3]
2023-02-24T17:43:44.366595Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET application_name = 'PostgreSQL JDBC Driver']
2023-02-24T17:43:44.374469Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SELECT * FROM (SELECT symbol, sum(amount) as volume, min(price) as minimum, max(price) as maximum, round((max(price)+min(price))/2, 2) as mid, timestamp as ts FROM trades sample by 1m ALIGN to CALENDAR) SPARK_GEN_SUBQ_0 WHERE 1=0]
2023-02-24T17:43:44.375049Z I i.q.c.p.ReaderPool open 'trades~10' [at=0:0]
2023-02-24T17:43:44.375613Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:44.375991Z I i.q.g.SqlCompiler plan [q=`select-choose symbol, volume, minimum, maximum, mid, ts from (select [symbol, volume, minimum, maximum, mid, ts] from (select-virtual [symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts] symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts from (select-group-by [symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts] symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts from (select-choose [symbol, amount, price, timestamp ts] symbol, amount, price, timestamp ts from (select [symbol, amount, price, timestamp] from trades timestamp (timestamp))) sample by 1m align to calendar with offset '00:00')) SPARK_GEN_SUBQ_0 where 1 = 0) SPARK_GEN_SUBQ_0`, fd=128]
2023-02-24T17:43:44.376084Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:44.376140Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:44.389639Z I pg-server scheduling disconnect [fd=128, reason=15]
2023-02-24T17:43:44.389722Z I pg-server disconnected [ip=127.0.0.1, fd=128, src=queue]
2023-02-24T17:43:45.128877Z I pg-server connected [ip=127.0.0.1, fd=128]
2023-02-24T17:43:45.129224Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2023-02-24T17:43:45.129861Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET extra_float_digits = 3]
2023-02-24T17:43:45.130051Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET application_name = 'PostgreSQL JDBC Driver']
2023-02-24T17:43:45.130524Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SELECT * FROM (SELECT symbol, sum(amount) as volume, min(price) as minimum, max(price) as maximum, round((max(price)+min(price))/2, 2) as mid, timestamp as ts FROM trades sample by 1m ALIGN to CALENDAR) SPARK_GEN_SUBQ_8 WHERE 1=0]
2023-02-24T17:43:45.130937Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.130991Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:45.131287Z I i.q.g.SqlCompiler plan [q=`select-choose symbol, volume, minimum, maximum, mid, ts from (select [symbol, volume, minimum, maximum, mid, ts] from (select-virtual [symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts] symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts from (select-group-by [symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts] symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts from (select-choose [symbol, amount, price, timestamp ts] symbol, amount, price, timestamp ts from (select [symbol, amount, price, timestamp] from trades timestamp (timestamp))) sample by 1m align to calendar with offset '00:00')) SPARK_GEN_SUBQ_8 where 1 = 0) SPARK_GEN_SUBQ_8`, fd=128]
2023-02-24T17:43:45.131354Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.131408Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:45.132277Z I pg-server scheduling disconnect [fd=128, reason=15]
2023-02-24T17:43:45.132323Z I pg-server disconnected [ip=127.0.0.1, fd=128, src=queue]
2023-02-24T17:43:45.142150Z I pg-server connected [ip=127.0.0.1, fd=128]
2023-02-24T17:43:45.142363Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2023-02-24T17:43:45.142720Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET extra_float_digits = 3]
2023-02-24T17:43:45.142905Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET application_name = 'PostgreSQL JDBC Driver']
2023-02-24T17:43:45.143261Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SELECT * FROM (SELECT symbol, sum(amount) as volume, min(price) as minimum, max(price) as maximum, round((max(price)+min(price))/2, 2) as mid, timestamp as ts FROM trades sample by 1m ALIGN to CALENDAR) SPARK_GEN_SUBQ_16 WHERE 1=0]
2023-02-24T17:43:45.143604Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.143640Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:45.143897Z I i.q.g.SqlCompiler plan [q=`select-choose symbol, volume, minimum, maximum, mid, ts from (select [symbol, volume, minimum, maximum, mid, ts] from (select-virtual [symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts] symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts from (select-group-by [symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts] symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts from (select-choose [symbol, amount, price, timestamp ts] symbol, amount, price, timestamp ts from (select [symbol, amount, price, timestamp] from trades timestamp (timestamp))) sample by 1m align to calendar with offset '00:00')) SPARK_GEN_SUBQ_16 where 1 = 0) SPARK_GEN_SUBQ_16`, fd=128]
2023-02-24T17:43:45.143976Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.144024Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:45.144785Z I pg-server scheduling disconnect [fd=128, reason=15]
2023-02-24T17:43:45.144834Z I pg-server disconnected [ip=127.0.0.1, fd=128, src=queue]
2023-02-24T17:43:45.152545Z I pg-server connected [ip=127.0.0.1, fd=128]
2023-02-24T17:43:45.152802Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2023-02-24T17:43:45.153271Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET extra_float_digits = 3]
2023-02-24T17:43:45.153489Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SET application_name = 'PostgreSQL JDBC Driver']
2023-02-24T17:43:45.153921Z I i.q.c.p.PGConnectionContext parse [fd=128, q=SELECT * FROM (SELECT symbol, sum(amount) as volume, min(price) as minimum, max(price) as maximum, round((max(price)+min(price))/2, 2) as mid, timestamp as ts FROM trades sample by 1m ALIGN to CALENDAR) SPARK_GEN_SUBQ_24 WHERE 1=0]
2023-02-24T17:43:45.154203Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.154234Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:45.154460Z I i.q.g.SqlCompiler plan [q=`select-choose symbol, volume, minimum, maximum, mid, ts from (select [symbol, volume, minimum, maximum, mid, ts] from (select-virtual [symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts] symbol, volume, minimum, maximum, round(maximum + minimum / 2,2) mid, ts from (select-group-by [symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts] symbol, sum(amount) volume, min(price) minimum, max(price) maximum, ts from (select-choose [symbol, amount, price, timestamp ts] symbol, amount, price, timestamp ts from (select [symbol, amount, price, timestamp] from trades timestamp (timestamp))) sample by 1m align to calendar with offset '00:00')) SPARK_GEN_SUBQ_24 where 1 = 0) SPARK_GEN_SUBQ_24`, fd=128]
2023-02-24T17:43:45.154517Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.154563Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
2023-02-24T17:43:45.155332Z I pg-server scheduling disconnect [fd=128, reason=15]
2023-02-24T17:43:45.155373Z I pg-server disconnected [ip=127.0.0.1, fd=128, src=queue]
notice that first it does:
2023-02-24T17:43:44.375049Z I i.q.c.p.ReaderPool open 'trades~10' [at=0:0]
2023-02-24T17:43:44.375613Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
then always:
2023-02-24T17:43:45.130937Z I i.q.c.p.ReaderPool assign 'trades~10' [at=0:0]
2023-02-24T17:43:45.130991Z I i.q.c.p.ReaderPool 'trades~10' is back [at=0:0, thread=18]
and it is always
at=0:0
so the same reader at the same index in the pool is reused.
any chance you can create a github issue with the details, maybe with log attached? the log would help a lot. we will try to reproduce it. will create more partitions and try to access different partitions in the queries (as you said above)
hi @Matthew Halko, how often did the below line appear in the logs before you changed the config?
AbstractMultiTenantPool open 'trades~10' [at=0:0]
would you be able to share logs? obviously without any sensitive information.