https://questdb.io logo
Title
a

Alex Lourie

05/23/2023, 1:09 AM
We're ingesting a lot of data, some of which comes out of order. The DB ingests it fairly quickly, but then takes ages to play the WAL entries. Here's the question: how can we improve WAL processing? Especially the issue is with 7.1.2, it's extremely slow. We haven't seen this issue with 7.0.1
a

Andrey Pechkurov

05/23/2023, 7:12 AM
Hi Alex, Are you ingesting into single table or multiple ones? How slow is the ingestion? You can check that in the log under
ApplyWal2TableJob job finished
log messages. There should be rate in rows/s.
a

Alex Lourie

05/23/2023, 7:16 AM
@Andrey Pechkurov we ingest into a single table. The ingestion was at a fair speed, however the wal progress is extremely slow - it started at about 15k rows/sec, and now has fallen to about 20. It runs for more than 24 hours now. This is with 7.1.2. With 7.0.1 the ingestion time is about the same, but the WAL jobs were done in about an hour and a half. Same dataset.
a

Andrey Pechkurov

05/23/2023, 7:18 AM
now has fallen to about 20
20 rows/s, not 20K, right?
BTW have you tried 7.1.3?
There were some bugfixes around deadlocks in the release
a

Alex Lourie

05/23/2023, 7:20 AM
Yes, that's 20, not 20000
And haven't tried 7.1.3 yet
It wasn't around when we started the tests :-)
However, I'm definitely and absolutely sure that 7.0.1 performs almost by order of magnitudes better that 7.1.2 in our environment
a

Andrey Pechkurov

05/23/2023, 7:22 AM
20 rows sounds awful. BTW what's your ingestion pattern? Do you use ILP or PGWire?
a

Alex Lourie

05/23/2023, 7:23 AM
ILP with copy statements, with batches
a

Andrey Pechkurov

05/23/2023, 7:24 AM
So, you mix ILP with COPY?
a

Alex Lourie

05/23/2023, 7:25 AM
Eh no, sorry. Just ILP
Was looking in the wrong spot
a

Andrey Pechkurov

05/23/2023, 7:28 AM
How large are your partitions?
Also could you share a portion of your server logs with
ApplyWal2TableJob
messages?
a

Alex Lourie

05/23/2023, 7:30 AM
Partitions are daily....about 50m rows
I'll post the logs a bit later
a

Andrey Pechkurov

05/23/2023, 7:31 AM
Thanks
a

Alex Lourie

05/23/2023, 8:22 AM
@Andrey Pechkurov
2023-05-23T08:21:53.611300Z I i.q.c.TableWriter sorting WAL [table=interval_data, ordered=false, lagRowCount=200000000, walRowLo=177800, walRowHi=198500]
2023-05-23T08:22:01.184424Z I i.q.c.TableWriter o3 partition task [table=interval_data, partitionIsReadOnly=false, srcOooBatchRowSize=20700, srcOooLo=0, srcOooHi=20699, srcOooMax=20700, o3RowCount=200020700, o3LagRowCount=200000000, srcDataMax=1670556, o3TimestampMin=2020-08-27T21:30:00.000000Z, o3Timestamp=2020-08-27T21:30:00.000000Z, o3TimestampMax=2020-08-27T21:30:00.000000Z, partitionTimestamp=2020-08-27T00:00:00.000000Z, partitionIndex=956, partitionSize=1691256, maxTimestamp=2020-12-31T23:00:00.000000Z, last=false, append=false, pCount=1, flattenTimestamp=true, memUsed=19511207324]
2023-05-23T08:22:01.833824Z I i.q.c.TableWriter merged partition [table=`interval_data`, ts=2020-08-27T00:00:00.000000Z, txn=14650]
2023-05-23T08:22:01.833936Z I i.q.c.TableWriter moving rows to LAG [table=interval_data, lagRowCount=200000000, partitionTimestampHi=2020-12-31T23:59:59.999999Z]
2023-05-23T08:22:02.795680Z I i.q.c.TableWriter removed [path=/var/lib/questdb/db/interval_data~5/2020-08-27.14648]
2023-05-23T08:22:03.186575Z I i.q.c.w.ApplyWal2TableJob job ejected [table=interval_data~5, seqTxn=12230, transactions=1, rows=20700, time=9575ms, rate=2161rows/s, physicalWrittenRowsMultiplier=81.7]
2023-05-23T08:22:03.186631Z I i.q.c.p.WriterPool << [table=`interval_data~5`, thread=62]
2023-05-23T08:22:03.186681Z I i.q.c.p.WriterPool >> [table=`interval_data~5`, thread=62]
2023-05-23T08:22:03.186877Z I i.q.c.TableWriter processing WAL [path=/var/lib/questdb/db/interval_data~5/wal49/28, roLo=169000, roHi=193800, seqTxn=12231, tsMin=2020-01-07T00:00:00.000000Z, tsMax=2020-01-07T23:00:00.000000Z, commitToTimestamp=]
2023-05-23T08:22:03.186965Z I i.q.c.TableWriter sorting WAL [table=interval_data, ordered=false, lagRowCount=200000000, walRowLo=169000, walRowHi=193800]
it picked up pace a bit again
but still lots to process
just restarted with 7.1.3, no change of pace
a

Andrey Pechkurov

05/23/2023, 8:35 AM
Thanks for sharing the log. I think we got some clues on what goes wrong. Let me discuss this with other core team members
a

Alex Lourie

05/23/2023, 8:38 AM
thanks, @Andrey Pechkurov!
j

javier ramirez

05/23/2023, 8:50 AM
Also, in version 7.2 we hope to release partition splits, which will not need to overwrite the whole partition in the event of out-of-order data, but only chunks of the partition. Still finishing that, but it is quite advanced already
a

Alex Lourie

05/23/2023, 8:51 AM
sounds good too. Would that mean O3 operations will be more efficient?
a

Alex Pelagenko

05/23/2023, 8:53 AM
Seems you write some historical data in the range of 2020-01 .. 2020-12. If you write it chronologically this will help
a

Alex Lourie

05/23/2023, 9:04 AM
yea, but that's not currently possible. Our dataset comes as ~60G, to sort that chronologically before ingesting feels like doing DB's job 😉
a

Alex Pelagenko

05/23/2023, 9:04 AM
It can do it more easily if you import as csv
a

Alex Lourie

05/23/2023, 9:05 AM
I do import it almost as csv - we've mostly copied the code from the go tool
we just reconstruct it a little before pushing to the DB
a

Alex Pelagenko

05/23/2023, 9:06 AM
also you can try this
ALTER TABBLE table SET PARAM maxUncommittedRows = 1000
this may speed up a bit, but not much.
you use ILP to write atm, don’t you? Or is it csv import?
a

Alex Lourie

05/23/2023, 9:07 AM
it's ILP
we read in the CSV file, change the format a little and push to the DB with ILP
a

Alex Pelagenko

05/23/2023, 9:07 AM
if you send whole 60GB as csv it will import it in few mins
a

Alex Lourie

05/23/2023, 9:07 AM
even if it's out-of-order?
a

Alex Pelagenko

05/23/2023, 9:08 AM
yes, it’s optimised way to import unordered data. It’s a copy command https://questdb.io/docs/reference/sql/copy/
j

javier ramirez

05/23/2023, 9:08 AM
A blog post with some data and a trick if you need an extra speed burst https://questdb.io/blog/2022/09/12/importing-300k-rows-with-io-uring/
a

Alex Lourie

05/23/2023, 9:09 AM
why is that better than an ILP ingestion?
a

Alex Pelagenko

05/23/2023, 9:10 AM
it’s better for unordered initnial import, it’s optimised to sort efficiently. ILP is better for realtime
a

Alex Lourie

05/23/2023, 9:11 AM
oh that's interesting
j

javier ramirez

05/23/2023, 9:11 AM
There is a section “optimizing the import” on the blog post I shared that gives you some details about how the COPY command works
a

Alex Lourie

05/23/2023, 9:11 AM
would ingesting with COPY be better than ILP in this case?
j

javier ramirez

05/23/2023, 9:11 AM
You also have some numbers on a 75Gb unordered CSV so you can benchmark
as Alex said, ILP is designed for streaming, COPY is designed for batch. Ingesting 60Gb with old-ish data sounds more batch than streaming to me, so COPY will be more efficient in this case
a

Alex Lourie

05/23/2023, 9:15 AM
Just trying to make sure I understand. We have ~2000 csv files, each with data for about 1000 devices. Each file has the data for a year, that is each file will include the data from Jan 1 to Dec 31. Batch-loading with COPY would be more appropriate than ILP in this case?
j

javier ramirez

05/23/2023, 9:18 AM
I was thinking of a single large file. It would be more efficient probably if you can aggregate multiple CSV together as COPY will need to do a lot of back and forth to rearrange the data, so probably better if you issue just one (or few) copy commands than 2000 of those
a

Alex Lourie

05/23/2023, 9:20 AM
by aggregating you mean also time-sorted?
j

javier ramirez

05/23/2023, 9:20 AM
nope. Just concatenate
a

Alex Lourie

05/23/2023, 9:21 AM
so bigger files, but less of them?
each of our files is about 300Mb
j

javier ramirez

05/23/2023, 9:22 AM
and each of those files is unordered, right?
a

Alex Lourie

05/23/2023, 9:22 AM
right
well, maybe ordered, but only within that file
having that ordered is not hard, loading it into memory and sorting takes only a few seconds
maybe there's 200 files, not 2000
but still, lots
j

javier ramirez

05/23/2023, 9:26 AM
Ok, if the data inside each file is sorted, I don’t know which would be faster, if the overhead of calling COPY multiple times, or the overhead of rearranging the data inside, but luckily Andrei knows much more than me about COPY and import 🙂
a

Andrey Pechkurov

05/23/2023, 9:26 AM
If you'll be using COPY, then you need to concatenate all of your CSVs into a single file. The table must be empty before the import, that's why the concatenation is needed
Once you have the CSV file, COPY should work much-much faster then ingesting heavily unsorted data via ILP: the throughput is usually in the order or 300K-500K rows/s
a

Alex Lourie

05/23/2023, 9:42 AM
And how quickly will it be ready?
a

Andrey Pechkurov

05/23/2023, 9:48 AM
Do you mean how quickly the COPY will finish?
a

Alex Lourie

05/23/2023, 9:49 AM
And how quickly the db will settle down with sorting
a

Andrey Pechkurov

05/23/2023, 9:50 AM
That depends on the number of rows you need to import. Say, importing 1B rows with COPY should take not more than 30-45 minutes
a

Alex Lourie

05/23/2023, 9:50 AM
Interesting. On which sort of a machine
a

Andrey Pechkurov

05/23/2023, 9:53 AM
Oh, sorry - forgot to mention HW. That's on a machine with a local SSD. Machine with a network attached volume may be around 5x slower.
a

Alex Lourie

05/23/2023, 9:56 AM
Ah got it
a

Andrey Pechkurov

05/23/2023, 9:58 AM
Even with a network attached volume, it should be significantly faster than 20 rows/s 🙂
a

Alex Lourie

05/23/2023, 10:00 AM
Will test this
a

Andrey Pechkurov

05/23/2023, 10:03 AM
Please let us know if you face any issues. We did our best to document COPY statement in the best way, but there is a chance that the docs still lack some important details
a

Alex Lourie

05/23/2023, 10:06 AM
Will do. Also, it would probably be useful to have more clarity around scenarios of ingestion, and which methods are better for which usecases
Like i this case, nowhere I could see that ILP is best for streaming
Quite a lot of mentions it being the best ingestion method
a

Andrey Pechkurov

05/23/2023, 10:08 AM
Yup, it makes sense. We'll improve the docs
g

Gabriel Mocan

05/23/2023, 11:14 AM
Great discussion!