https://questdb.io logo
Title
r

Roj Codeur

01/19/2023, 4:01 PM
Hi there, @Andrey Pechkurov : so, I think the performance is better than what it was. I am looking to insert a few million rows - is there anything I can do to tune insert performance
Forgot to mention, I use the official lib wrapped using PyCall in Julia code
a

Andrey Pechkurov

01/19/2023, 4:16 PM
Hi Roj, What's the performance so far? BTW apart from ILP we also support CSV import and it's also pretty fast: https://questdb.io/blog/2022/09/12/importing-300k-rows-with-io-uring
r

Roj Codeur

01/19/2023, 4:26 PM
So far, 300k rows in 7 secs. We were discussing this here https://github.com/questdb/questdb/issues/2831
I have to be able to insert a few million rows
I don’t have data in csv, it comes in JSON, or other formats
n

Nicolas Hourcard

01/19/2023, 5:42 PM
thx Roj, we will look into this for you. Can i ask, is this a personal project or production use case at a company?
a

Andrey Pechkurov

01/19/2023, 5:52 PM
Try setting
cairo.o3.min.lag=60000
config property in
<db_root_dir>/conf/server.conf
. I've checked your server logs and seems that you have heavily O3 data. The above setting will help the database to commit less frequently leading to lower write amplification
r

Roj Codeur

01/19/2023, 6:09 PM
Got it, thanks! I tried sorting data by date before I started inserting, didn’t help
a

Andrey Pechkurov

01/19/2023, 6:20 PM
Yes, it won't help as the database does it anyway
To be more precise: eliminating out-of-order writes as a class will certainly help. But sorting rows within batches before sending them over the wire won't.
r

Roj Codeur

01/19/2023, 6:32 PM
I sort the whole data frame by date and then divide it in batches, that didn’t seem to help
@Andrey Pechkurov: tried updating the
cairo.o3.min.lag=60000
property. I am afraid the results are still the same
a

Andrey Pechkurov

01/21/2023, 11:59 AM
Could you share server logs?
r

Roj Codeur

01/21/2023, 12:00 PM
sure
stdout-2023-01-21T11-55-40.txt
server.conf
server.conf as well, hope this helps
ta!
a

Andrey Pechkurov

01/21/2023, 12:25 PM
cairo.o3.min.lag=60
is not correct
should be
cairo.o3.min.lag=60000
r

Roj Codeur

01/21/2023, 12:25 PM
ahh, my apologs
a

Andrey Pechkurov

01/21/2023, 12:26 PM
it's in millis
r

Roj Codeur

01/21/2023, 12:26 PM
apologies
sorry, fixing now
a

Andrey Pechkurov

01/21/2023, 12:26 PM
no prob
r

Roj Codeur

01/21/2023, 12:28 PM
1-2 secs faster
now takes 5 secs, used to take close to 7
do you think its possible to achieve 100k per second on a laptop. currently, I am at 64k.
a

Andrey Pechkurov

01/21/2023, 12:33 PM
my laptop does around 700-900K/s in TSBS, so it's certainly feasible
can you run ingestion on multiple threads or application instances?
r

Roj Codeur

01/21/2023, 12:35 PM
multiple threads possibly, not on application instances. I am new to Julia and programming in general, but, multiple threads is a possibility. in the meantime, do you think I can do better on a single thread?
a

Andrey Pechkurov

01/21/2023, 12:41 PM
with TSBS loader (which is a Golang application) I can get around 400K on a single thread, but many things depend on how badly O3 your rows are
r

Roj Codeur

01/21/2023, 12:42 PM
got it. to account for O3, I download all my data and then I sort it by date. table:
DROP TABLE IF EXISTS ohlcv;
CREATE TABLE ohlcv(
Ticker SYMBOL CAPACITY 10000,
Exchange SYMBOL CAPACITY 50,
DataSource SYMBOL CAPACITY 50,
Date TIMESTAMP,
Open double,
High double,
Low double,
Close double,
AdjustedClose double,
Volume double,
timestamp TIMESTAMP)
TIMESTAMP(date)
PARTITION BY MONTH;
I then proceed to iterate over the dataframe and call sender.row()
I was hoping this would eliminate the o3 issues - do you think this would help?
a

Andrey Pechkurov

01/21/2023, 12:51 PM
Is it a pandas dataframe? Recently we've added direct pandas ingestion support to the Python client.
r

Roj Codeur

01/21/2023, 12:51 PM
nope, I use Julia, I am afraid. its Julia dataframe
a

Andrey Pechkurov

01/21/2023, 12:52 PM
OK, no problem with that. Sorting on the client side may take quite some time, but if that's the full data set, ingesting in-order rows is the optimal way to insert data into QDB.
r

Roj Codeur

01/21/2023, 12:53 PM
yeah, got it, Julia dataframes are very fast at sorting.
well, I guess I will stick to the current speed. in the meantime, if you can think of anything else that would help or if I can help with this, please do let me know. I will keep browsing the docs as well.
thanks a lot for your help @Andrey Pechkurov. you have been awesome. I can see that you and the team take a lot of pride in your work, I am sure this database will be immensely successful!
a

Andrey Pechkurov

01/21/2023, 12:57 PM
My pleasure
BTW if you want to stick to O3 rows, try increasing min lag value even further, say, to 5 minutes:
cairo.o3.min.lag=300000
r

Roj Codeur

01/21/2023, 1:07 PM
sure, will give that a shot, ta!
another ques: it looks like the data is not available as soon as I commit it, I am still going through the documentation to see why. - can you please point me to a direction pls?
is this related to commit lag?
a

Andrey Pechkurov

01/21/2023, 1:12 PM
Yes, the data is accumulated in an in-memory buffer and committed periodically. Once it's committed, it's available for reads.
By increasing the min lag, you're increasing the invisiblity time window, but improve performance as commits happens more rarely and include more data
r

Roj Codeur

01/21/2023, 1:14 PM
got it! I presume you are talking about the below setting
cairo.o3.min.lag=300000
I will play around and see which setting works best. any other settings I should be aware of pls? ta!
a

Andrey Pechkurov

01/21/2023, 1:17 PM
Yes, that's the setting. Just note that there is also max lag which is 600000 (10 mins) by default and you should keep min value less than the maximum one
r

Roj Codeur

01/21/2023, 1:18 PM
got it, thanks mate, you are a Star!