Title
#users-public
p

Petr Postulka

10/10/2022, 2:11 PM
hi guys, very quick question regarding csv copy ... we've used this feature now and it is really great! I just want to double check re the performance/timings and whether we are doing something wrong (some misconfig) or it matches the expected perf ... we import 505M rows with 168 columns (mainly floats/doubles and approx 10 string symbols) and it takes 17 minutes. Should it be faster or not? I have seen your blog post where you ingest 3B rows per sec and that's the reason why I'm asking. Thx
j

javier ramirez

10/13/2022, 9:21 AM
CSV import performance depends A LOT on how fast your disk is
9:21 AM
In the post you are referring to, we are using two volumes for the import
9:21 AM
One is your root volume where QuestDB stores its data
9:22 AM
The other is a faster disk that we use only to store the CSV while executing COPY and you can then detach/delete
p

Petr Postulka

10/13/2022, 9:23 AM
we have the same setup ... 2 high speed nvme disks, one for quest one for csv import
j

javier ramirez

10/13/2022, 9:30 AM
I rechecked the blog post, and the claim is 300K rows per second, for a total of ~100 million rows on a 76GB csv
9:37 AM
at 300K rows per second, if I am not mistaken, importing 505 million rows should take about 28 minutes. In your case it is taking 17 minutes
9:38 AM
It seems you are actually getting faster ingestion than the post we wrote, which could be explained because your instance has much more memory and your CSV seems to be already sorted
9:39 AM
To the best of my knowledge, we only have claimed speeds of billions reads per second for querying data, not for ingesting
9:40 AM
Not sure if @Andrey Pechkurov might have some advise, but your numbers actually sound quite fast to me after doing the maths
Andrey Pechkurov

Andrey Pechkurov

10/13/2022, 9:46 AM
The machine is quite good, so it's shouldn't a bottleneck
9:47 AM
@Petr Postulka are you sure that you're using the new parallel import? It's used for partitioned tables only
9:48 AM
If your table is a non-partitioned one, the import would be serial
p

Petr Postulka

10/13/2022, 9:48 AM
yes, it is partitioned
Andrey Pechkurov

Andrey Pechkurov

10/13/2022, 9:49 AM
at 300K rows per second, if I am not mistaken, importing 505 million rows should take about 28 minutes. In your case it is taking 17 minutes
Yes, that's the right calculation. 17 minutes sound like a good result to me
9:49 AM
What are your expectations for the import time?
9:50 AM
Your current time yields around 500K rows/s
p

Petr Postulka

10/13/2022, 9:51 AM
ok, in that case all good ... but I think the blogpost had really 3B rows/sec before ... that's why I was asking 🙂
9:51 AM
whether we are missing something or not ... so clearly all good and the perf is expected
Andrey Pechkurov

Andrey Pechkurov

10/13/2022, 9:54 AM
No, it was 3M before due to a typo in the calculations (I'm the one to blame for that), but we fixed the bug 🙂
p

Petr Postulka

10/13/2022, 10:12 AM
no worries, thx for explanation 🙂