I've been reading some stuff about the "active partition," and out-of-sequence inserts, and one thing is unclear to me -- am I able to run multiple feed handlers in parallel inserting into different partitions at the same time, or will this cause performance degradation? For example, if I want to push through large historical tape files each with ~2-3 billion records/day, can I run multiple in parallel, as long as they're each inserting data from different days (so one from 2022-09-13 and the other from 2022-09-12)?
09/14/2022, 4:42 PM
Hi John, parallel import is possible for importing CSV files into an empty table using the SQL COPY command. Here is a guide. The parallelism is based on the table partition. I am not 100% if this suits your use case but if there is anything that’s unclear please let us know.
09/14/2022, 4:44 PM
I had seen some of that - however I've been inserting using the C# API which I believe uses the ILP protocol (https://github.com/questdb/net-questdb-client) Would it be ill-advised to do parallel inserts using that API? I'm trying to avoid having to go C# > .csv > QuestDB, and just go C# > QuestDB, while staying parallel if possible
These files are very large so making a CSV would be several hundred GB/day
09/14/2022, 4:45 PM
Ah okay then it is a different method. ILP is more suitable for regular inserts of smaller files.
[Updated] I am not sure about this but @Alex Pelagenko or @Andrey Pechkurov will be able to provide more details!
09/14/2022, 5:43 PM
It will be performance degradation
If your source data is sorted by time the best parallelization will be to send line by line round robin though multiple connections
Sounds like you have terrabytes, don't you?
09/14/2022, 5:47 PM
Multiple connections sending line by line on the same date partition? Interesting that would result in better performance than single threaded bulked inserts (on one connection)
@Alex Pelagenko yeah definitely terabytes
09/14/2022, 5:48 PM
Yes, multiple connections will give ability to saturate network bandwidth
09/14/2022, 5:48 PM
Would that be an issue even on local loopback connections?
09/14/2022, 5:49 PM
Yes, localhost multiple connections are faster than single. Writing in order is the best performance
09/14/2022, 5:50 PM
Thanks, appreciate the feedback! Will try that out.
09/14/2022, 5:51 PM
Please measure that the result data fits your hard drive in QuestDB
If you doing terrabytes
In theory writing multiple days can be parallel independent operations but we don't do it that way yet
09/14/2022, 5:53 PM
Yeah that'd definitely be a nice feature in the future for users that have a lot of data like this