Andy Beyer
03/09/2023, 12:13 AMSELECT COUNT(*) FROM <table>
will be frozen for 30 seconds at a particular number, so I suppose the WAL commits are bursty.
I've tried calling flush()
on the sender API but that often leads to an exception and closure on the ILP client, so for our high throughput topic I've removed those calls and am letting the ILP client just send the lines as it sees fit.
For reference, QDB is receiving about 30 MB/sec of raw Influx Iine Protocol lines (measured by 2 bytes/char*charCount), and about 3500 lines/sec overall across all topics/tables
Is there a way to have the backend flush periodically so we can control the latency we see in things like Grafana graphs?c6i.24xl
with the rest of our stack colocated (lots of docker processes)
and a gp3 drive with 16k IOPS, 1000 MB/sec throughput
steady state disk I/O reported by atop
shows ~300 MB/sec write , some (but not much) of which is Kafka, etc.
and bursts up to 850 MB/sec sometimes, I assume when QDB is flushing to diskNicolas Hourcard
03/09/2023, 9:21 AMAlex Pelagenko
03/09/2023, 9:35 AMselect * from sys.telemetry_wal
limit -20
Andy Beyer
03/09/2023, 1:54 PMNicolas Hourcard
03/09/2023, 1:55 PMAndy Beyer
03/09/2023, 1:56 PMbufferCapacity
on the Sender
, or is it better to just let Quest pick that value internally?Alex Pelagenko
03/09/2023, 2:21 PMAndy Beyer
03/09/2023, 2:22 PMAlex Pelagenko
03/09/2023, 2:22 PMSender
, can you please be more specific, are you talking ILP sender by QuestDB and what programming language?Andy Beyer
03/09/2023, 2:23 PMAlex Pelagenko
03/09/2023, 2:26 PMAndy Beyer
03/09/2023, 3:12 PMselect * from sys.telemetry_wal limit -25
latency
column, what are the units?Andrey Pechkurov
03/09/2023, 4:52 PMlatency
is in millisecondsAndy Beyer
03/09/2023, 4:54 PMwal_tables()
where the transaction number seems to stay frozen on some of the tables for up to ~1 minute at a time.
I guess that's the latency I was hoping to control a little more tightly and get it down to ~3-5 seconds at most
For example for some of my tables, I see something like
select timestamp FROM '<table>' ORDER BY timestamp DESC LIMIT 5
show the most recent data available for query can be up to 1 minute staleAlex Pelagenko
03/09/2023, 4:58 PMWorkerPoolManager new DEDICATED pool [name=wal-apply
Andy Beyer
03/09/2023, 5:03 PMAlex Pelagenko
03/09/2023, 5:04 PMselect * from tables()
and then see what is in the sys.telemetry_wal
for that tableAndy Beyer
03/09/2023, 5:05 PM---
version: '3'
services:
###############
## QuestDB ##
###############
questdb:
image: questdb/questdb:7.0.1
container_name: questdb
restart: unless-stopped
environment:
QDB_SHARED_WORKER_COUNT: ${QUESTDB_SHARED_WORKER_COUNT:-44}
QDB_LINE_TCP_WRITER_WORKER_COUNT: ${QUESTDB_SHARED_WORKER_COUNT:-44}
QDP_LINE_TCP_CONNECTION_POOL_CAPACITY: 128
QDB_LINE_TCP_MSG_BUFFER_SIZE: 1048576
QDB_LINE_TCP_DEFAULT_PARTITION_BY: 'HOUR'
QDB_LINE_DEFAULT_PARTITION_BY: 'HOUR'
volumes:
- questdb-data:/var/lib/questdb
ports:
- "9000:9000"
- "9009:9009"
- "8812:8812"
volumes:
questdb-data:
Alex Pelagenko
03/09/2023, 5:05 PMAndy Beyer
03/09/2023, 5:19 PMwriteTxn
and sequencerTxn
are always increasing for tables that are receiving inserts as opposed to freezing for any noticeable period of time.
I'm going to run some higher scale tests as well , but they take some time to start up, so I'll check back in if I run into problems with latency again.Alex Pelagenko
03/09/2023, 5:19 PMAndy Beyer
03/09/2023, 5:24 PMAlex Pelagenko
03/09/2023, 5:27 PMAndy Beyer
03/09/2023, 6:03 PMselect * from tables()
-- the problem table has id=8
(command-request-status
)wal_tables()
writerTxn
and sequencerTxn
are very low, and don't seem to have budged over timeselect * from sys.telemetry_wal WHERE tableId=8 LIMIT -25
created
columntimestamp
datadocker logs questdb |& grep " E \|Except"
show no errors or exceptionsAlex Pelagenko
03/09/2023, 6:12 PMAndy Beyer
03/09/2023, 6:12 PM18:01
so 4 minutes staleAlex Pelagenko
03/09/2023, 6:13 PMAndy Beyer
03/09/2023, 6:14 PMAlex Pelagenko
03/09/2023, 6:14 PMAndy Beyer
03/09/2023, 6:15 PMwal_tables()
still showing same txn numbers:Alex Pelagenko
03/09/2023, 6:19 PMAndy Beyer
03/09/2023, 6:19 PMAlex Pelagenko
03/09/2023, 6:19 PMAndy Beyer
03/09/2023, 6:20 PMroot@5d77b8a8e09a:/var/lib/questdb/db# date -u ; ls -lhrt command-request-status~8*
Thu Mar 9 18:36:51 UTC 2023
-rw-r--r-- 1 questdb questdb 0 Mar 9 18:29 command-request-status~8.lock
command-request-status~8:
total 100K
-rw-r--r-- 1 questdb questdb 49 Mar 9 17:57 _name
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 17:57 _meta.prev
-rw-r--r-- 1 questdb questdb 0 Mar 9 17:57 wal1.lock
drwxrwxr-x 2 questdb questdb 98 Mar 9 17:57 txn_seq
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 17:57 _txn
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 17:57 _meta
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 17:57 _cv
drwxrwxr-x 2 questdb questdb 153 Mar 9 17:57 2023-03-09T17
drwxrwxr-x 3 questdb questdb 29 Mar 9 17:57 wal1
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:09 symbolColumn1.v
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:09 symbolColumn1.o
-rw-r--r-- 1 questdb questdb 8.0K Mar 9 18:09 symbolColumn1.k
-rw-r--r-- 1 questdb questdb 8.0K Mar 9 18:09 symbolColumn1.c
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:09 symbolColumn2.v
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:09 symbolColumn2.o
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:09 symbolColumn2.k
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:09 symbolColumn2.c
-rw-r--r-- 1 questdb questdb 33K Mar 9 18:29 _txn_scoreboard
-rw-r--r-- 1 questdb questdb 4.0K Mar 9 18:29 _todo_
close()
on the LineTcpSender
during a shutdown hook and then terminates and restarts the Java process
I immediately saw the select count(*)
for that table jump from ~241 rows to ~32000 rows
so perhaps I need to periodically call close()
on the LineTcpSender
instances and recreate them? Instead of just periodically calling flush()
?close()
periodically on the Java ILP LineTcpSender
)
does seem to be a more forceful way of flushing and reducing latency
is that recommended as the usage pattern, or just a workaround for now?Alex Pelagenko
03/09/2023, 10:06 PM2023-03-09T18:04:02.715310Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal6/1, roLo=5310, seqTxn2794, roHi=5334, tsMin=2023-03-09T18:03:57.057000Z, tsMax=2023-03-09T18:04:01.907000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.716667Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal4/3, roLo=5813, seqTxn2795, roHi=5839, tsMin=2023-03-09T18:03:56.735000Z, tsMax=2023-03-09T18:04:01.872000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.717925Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal5/3, roLo=5594, seqTxn2796, roHi=5618, tsMin=2023-03-09T18:03:56.875000Z, tsMax=2023-03-09T18:04:01.872000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.719238Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal2/2, roLo=6087, seqTxn2797, roHi=6113, tsMin=2023-03-09T18:03:56.736000Z, tsMax=2023-03-09T18:04:01.873000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.720473Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal3/2, roLo=6149, seqTxn2798, roHi=6174, tsMin=2023-03-09T18:03:56.855000Z, tsMax=2023-03-09T18:04:01.871000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.722016Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal7/1, roLo=5101, seqTxn2799, roHi=5106, tsMin=2023-03-09T18:04:00.907000Z, tsMax=2023-03-09T18:04:01.872000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.723456Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal1/2, roLo=6799, seqTxn2800, roHi=6822, tsMin=2023-03-09T18:03:57.057000Z, tsMax=2023-03-09T18:04:01.871000Z, commitToTimestamp=1969-12-31T23:59:59.999999Z]
2023-03-09T18:04:02.724925Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/device-status~6/wal8/1, roLo=4348, seqTxn2801, roHi=4373, tsMin=2023-03-09T18:03:56.872000Z, tsMax=2023-03-09T18:04:01.873000Z, commitToTimestamp=294247-01-10T04:00:54.775807Z]
2023-03-09T18:04:02.725389Z I i.q.c.TableWriter sorting WAL [table=device-status, ordered=true, lagRowCount=153, rowLo=4348, rowHi=4373]
2023-03-09T18:04:02.725525Z I i.q.c.TableWriter o3 partition task [table=device-status, partitionIsReadOnly=false, srcOooBatchRowSize=178, srcOooLo=0, srcOooHi=177, srcOooMax=178, o3RowCount=178, o3LagRowCount=0, srcDataMax=34051, o3TimestampMin=2023-03-09T18:03:56.735000Z, o3Timestamp=2023-03-09T18:03:56.735000Z, o3TimestampMax=2023-03-09T18:04:01.907000Z, partitionTimestamp=2023-03-09T18:00:00.000000Z, partitionIndex=4, partitionSize=34229, maxTimestamp=2023-03-09T18:04:00.871000Z, last=true, append=false, pCount=1, flattenTimestamp=true, memUsed=61788072304]
2023-03-09T18:04:02.739893Z I i.q.c.TableWriter closing last partition [table=device-status]
2023-03-09T18:04:02.743384Z I i.q.c.TableWriter merged partition [table=`device-status`, ts=2023-03-09T18:00:00.000000Z, txn=1019]
2023-03-09T18:04:02.746332Z I i.q.c.TableWriter switched partition [path='/var/lib/questdb/db/device-status~6/2023-03-09T18.1019']
2023-03-09T18:04:02.756696Z I i.q.c.TableWriter removed [path=/var/lib/questdb/db/device-status~6/2023-03-09T18.1018]
2023-03-09T18:04:02.756757Z I i.q.c.w.ApplyWal2TableJob WAL apply job finished [table=device-status, transactions=8, rows=178, time=41ms, rate=4294rows/s, physicalWrittenRowsMultiplier=593032.88]
2023-03-09T18:04:02.756796Z I i.q.c.p.WriterPool << [table=`device-status~6`, thread=71]
tsMax=2023-03-09T18:04:01.873000Z
was written2023-03-09T18:04:02.724925Z
it says processing WAL [path=... seqTxn2801, tsMin=2023-03-09T18:03:56.872000Z, tsMax=2023-03-09T18:04:01.873000Z, commitToTimestamp=294247-01-10T04:00:54.775807Z]
Andy Beyer
03/09/2023, 10:10 PMclose()
on the Java ILP client causes the data to show up in the server
probably means that the client isn't actually sending the data to the server and that's why the data is stale on the server side?Alex Pelagenko
03/09/2023, 10:14 PMselect timestamp from 'device-status' ORDER BY timestamp DESC LIMIT 5
was 17:58Andy Beyer
03/09/2023, 10:14 PMcommand-request-status
in that caseAlex Pelagenko
03/09/2023, 10:15 PMAndy Beyer
03/09/2023, 10:25 PMflush()
Alex Pelagenko
03/09/2023, 10:25 PMAndy Beyer
03/09/2023, 10:26 PMflush()
and all code has exclusive access to the sender
instance when making any calls to itAlex Pelagenko
03/09/2023, 10:28 PMAndy Beyer
03/09/2023, 10:28 PMAlex Pelagenko
03/09/2023, 10:31 PMmeasurmentName
sent incorrectly? would you be able to see that the data from this table landed in another?Andy Beyer
03/09/2023, 10:31 PMAlex Pelagenko
03/09/2023, 10:38 PMflush()
sends all, also data is sent on buffer overflow but everything looks like nothing hits QuestDBAndy Beyer
03/09/2023, 10:39 PMclose()
periodically and recreating the sender instance (instead of flush()
) seems to fix the issue I was seeingAlex Pelagenko
03/09/2023, 10:39 PMAndy Beyer
03/09/2023, 10:39 PMAlex Pelagenko
03/09/2023, 10:40 PM