Title
#users-public
John M.

John M.

09/14/2022, 4:06 PM
I've been reading some stuff about the "active partition," and out-of-sequence inserts, and one thing is unclear to me -- am I able to run multiple feed handlers in parallel inserting into different partitions at the same time, or will this cause performance degradation? For example, if I want to push through large historical tape files each with ~2-3 billion records/day, can I run multiple in parallel, as long as they're each inserting data from different days (so one from 2022-09-13 and the other from 2022-09-12)?
Amy Wang

Amy Wang

09/14/2022, 4:42 PM
Hi John, parallel import is possible for importing CSV files into an empty table using the SQL COPY command. Here is a guide. The parallelism is based on the table partition. I am not 100% if this suits your use case but if there is anything that’s unclear please let us know.
John M.

John M.

09/14/2022, 4:44 PM
I had seen some of that - however I've been inserting using the C# API which I believe uses the ILP protocol (https://github.com/questdb/net-questdb-client) Would it be ill-advised to do parallel inserts using that API? I'm trying to avoid having to go C# > .csv > QuestDB, and just go C# > QuestDB, while staying parallel if possible
4:45 PM
These files are very large so making a CSV would be several hundred GB/day
Amy Wang

Amy Wang

09/14/2022, 4:45 PM
Ah okay then it is a different method. ILP is more suitable for regular inserts of smaller files.
5:20 PM
[Updated] I am not sure about this but @Alex Pelagenko or @Andrey Pechkurov will be able to provide more details!
Alex Pelagenko

Alex Pelagenko

09/14/2022, 5:43 PM
It will be performance degradation
5:44 PM
If your source data is sorted by time the best parallelization will be to send line by line round robin though multiple connections
5:47 PM
Sounds like you have terrabytes, don't you?
John M.

John M.

09/14/2022, 5:47 PM
Multiple connections sending line by line on the same date partition? Interesting that would result in better performance than single threaded bulked inserts (on one connection)
5:48 PM
@Alex Pelagenko yeah definitely terabytes
Alex Pelagenko

Alex Pelagenko

09/14/2022, 5:48 PM
Yes, multiple connections will give ability to saturate network bandwidth
John M.

John M.

09/14/2022, 5:48 PM
Would that be an issue even on local loopback connections?
Alex Pelagenko

Alex Pelagenko

09/14/2022, 5:49 PM
Yes, localhost multiple connections are faster than single. Writing in order is the best performance
John M.

John M.

09/14/2022, 5:50 PM
Thanks, appreciate the feedback! Will try that out.
Alex Pelagenko

Alex Pelagenko

09/14/2022, 5:51 PM
Please measure that the result data fits your hard drive in QuestDB
5:51 PM
If you doing terrabytes
5:52 PM
In theory writing multiple days can be parallel independent operations but we don't do it that way yet
John M.

John M.

09/14/2022, 5:53 PM
Yeah that'd definitely be a nice feature in the future for users that have a lot of data like this
Andrey Pechkurov

Andrey Pechkurov

09/14/2022, 6:22 PM
It's already possible to write to multiple partitions in parallel, yet it's not really convenient and requires some server scripting: each partition may be written into its own table, then the partition may be detached (https://questdb.io/docs/reference/sql/alter-table-detach-partition/) and attached (https://questdb.io/docs/reference/sql/alter-table-attach-partition/) to the main table. Multiple such single-partition temporary tables could be written to simultaneously.