https://questdb.io logo
Title
j

John M.

03/24/2023, 6:04 PM
It might be helpful to write a blog post, or a little more documentation to the https://questdb.io/docs/concept/write-ahead-log/ page, about when WAL tables are advisable/not advisable for use. The comparison on that page is helpful, but calling out "using a WAL table in these types of scenarios is advisable: X, Y, Z, and is not advisable in these scenarios: A, B, C" explicitly could be helpful too. I have currently been using WAL tables because it seems like using them provides a more complete solution than not, however, seeing Andy's previous comment about disk thrashing has me realize that there probably are some costs to doing so (physical disk wear/tear, etc.), and perhaps I should switch to non-WAL for my usage case.
a

Alex Pelagenko

03/24/2023, 7:30 PM
Hey, thanks for the feedback. I think the concern you mentioned about disk thrashing is largely depend on the data pattern and write volume. Non-WAL tables have a bit of more measures to optimise it but it will be levelled up with WAL, we just need perhaps more feedback from the users
WAL is relatively new feature, only added few weeks ago
j

John M.

03/24/2023, 7:34 PM
yeah indeed - not trying to complain about it or anything. Just saying some guidance could be added for suggested use case(s). I could be wrong on this, but seems like WAL would be good to have for something that has out-of-sequence inserts, but if you have purely sequential inserts you'd see little benefit, correct? One other area surrounding WAL that isn't clear to me: say you have multiple applications pushing inserts into Quest, but they're for different partitions (so for example, if I have a table partitioned by date and have 3 different connections pushing 3 different dates at the same time in a historical data load). Would WAL benefit a scenario like that, as would those count as "out of sequence" even though they're sequential per partition but "out of sequence" in aggregate if you count all three partitions ingesting at the same time?
i

Imre

03/24/2023, 9:59 PM
“I have a table partitioned by date and have 3 different connections pushing 3 different dates at the same time” -> i would expect WAL-table perform better in this scenario. In non-WAL case table writer would write 3 partitions row by row at the same time, it would be jumping from one to another possibly on every row. In WAL case you have 3 WALs, one for each connection, data persisted as it comes. Then there is job which copies (applies) the data from the WALs to the table. This job can copy the data in bigger chunks, not only row by row for all 3 partitions.
there would not be any out of order, by that i mean none of the partitions would be rewritten.
“if you have purely sequential inserts you’d see little benefit, correct?” -> you should still see faster ingestion. It is much less work for QuestDB to get the data into WAL than directly into the table. There are less checks, no queues… it should be faster.
of course, the data still has to move into the table from the WAL, so the data will not be visible for queries faster, but it gets into the database faster.
i would say WAL should be always better. we might need to make it work better for certain scenarios, like Andy’s, but eventually WAL should be your default.