https://questdb.io logo
Title
r

Roj Codeur

01/19/2023, 4:01 PM
I have noticed articles which say questdb can go up to millions of rows per second, maybe if I tune it?
b

Brandon E.

01/20/2023, 10:09 AM
Hi @Roj Codeur, not part of QuestDb team, but in my experience I’d say that the server used to host questdb plays a huge role in those benchmarks, if you are using your own computer you can’t expect having the same performance.
h

Holger

01/20/2023, 11:17 AM
@Brandon E. True. I timed it at 600k rows/sec on a PC and ~150k rows/sec on a laptop (single disk 😢 ).These were all historical backfills symbol by symbol, so all out of order. Considering the hardware this is brutally fast. I benchmarked others like timescaledb and tdengine and it’s not even close.
r

Roj Codeur

01/20/2023, 11:21 AM
Agreed! I don’t think I will ever achieve the same performance on my laptop. I am taking this as a learning exercise to ensure that I can achieve the best performance, I am using questdb as the creators intended, if I can do anything better then I want to. For instance, you are clocking 150k rows per second, I am seeing close to 50k per second.
I think it’s an impressive product and I am really impressed with the support as well
j

javier ramirez

01/20/2023, 12:44 PM
Also, usually when we advertise numbers, they tend to be using parallel ingestion, so using multiple connections via ILP simultaneously. With a single connection and a fast drive you can probably be in the hundreds of thousands per second for sorted data
r

Roj Codeur

01/20/2023, 12:59 PM
Got it! That’s the thing, I have a single connection and I am only seeing 45k per second. I am wondering if there is something i am missing. For instance code wise or config Ta!
j

javier ramirez

01/20/2023, 1:06 PM
I believe you mentioned you are using python from Julia. It might be the case the serialisation across layers is introducing some performance issues. Of course it also depends where you are running this, if on a local drive or some hosting/cloud where disks are slower by default
r

Roj Codeur

01/20/2023, 1:08 PM
I run it on my laptop M1 MacBook pro
Interesting point about Python and Julia, in the past it hasn’t, I will try it out directly from python . Ta!
a

Adam Cimarosti

01/20/2023, 1:09 PM
I suspect the bottleneck is in the client then.
b

Brandon E.

01/20/2023, 1:09 PM
I am in the same boat, i am using Julia and using ILP by calling the Python library. In a macbook pro 16' w/ intel i was around 50k rows per second too, but once in a server, i think i am around 200-300k rows per second
a

Adam Cimarosti

01/20/2023, 1:09 PM
Constructing ILP from Python isn’t very fast which is why we’ve added
.dataframe()
support to speed things up.
r

Roj Codeur

01/20/2023, 1:09 PM
Yeh, I think so too. Or I am not using it correctly, as intended
a

Adam Cimarosti

01/20/2023, 1:10 PM
Julia is a pretty fast language: We don’t provide bindings for it at the moment. Depending on your familiarity with Julia you could decide to call the C ILP library (written in Rust) though I may be sending you down a rabbit hole if you’re not familiar with C. Resources: • The C API: https://github.com/questdb/c-questdb-client/blob/main/include/questdb/ilp/line_sender.h • Calling C from Julia: https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/ • Generating ILP in C: https://github.com/questdb/c-questdb-client/blob/main/examples/line_sender_c_example.c
Note that this is the same C api that the Python client is built upon, so it’s pretty stable.
b

Brandon E.

01/20/2023, 1:16 PM
@Adam Cimarosti do you think that using the C ILP library will be faster even than python using the new Dataframe way?
a

Adam Cimarosti

01/20/2023, 1:16 PM
No, it will not be.
Well. sort of 🙂
It depends on your use case: If you need to do lots of data conversion steps and complicated steps to get the data from Julia into Pandas dataframes then going the Pandas route would not help you.
The Python dataframe API is only going to be a smidge slower (my finger in the air guess 2 to 3% slower) than serializing the same data in C.
In other words, via Python
.dataframe()
the bottleneck becomes the QuestDB server, not the client.
b

Brandon E.

01/20/2023, 1:20 PM
Yeah, that’s what i was thinking, thank you for the explanation @Adam Cimarosti, btw, since we are already into this topic, any plans to support a julia client? 🙏
and one last question from my side, when you say that the benchmarks uses multiple ILP connections, u mean is a configuration in server.conf? or simply parallelizing the client?
a

Adam Cimarosti

01/20/2023, 1:23 PM
Parallelizing the client.
There is some config the benchmarks are using to limit the number of QuestDB threads. These give another 20% or so speedup over not custom-configuring. I’ve only done this in the benchmarks because the client and the server are on the same box, which is a pretty a-typical setup for a production environment.
RE Julia library: I personally lack the expertise for this. If this is something you want to start I’m happy to help out though.
I’m afraid we don’t have a resident Julia expert in the house 🙂
b

Brandon E.

01/20/2023, 1:29 PM
Thank you for the information, I am not a Julia expert but i think we can manage to start a client for Julia, which are the steps to start?
a

Adam Cimarosti

01/20/2023, 1:31 PM
You can take a look at the
py-questdb-client
repo.
You’d need to: • Add
c-questdb-client
as a sub-repo. • `cd c-questdb-client/questdb-rs-ffi && cargo build --release`: This will build the dynamic lib for that platform. • Then figure out where to copy that lib to so Julia can load it: https://docs.julialang.org/en/v1/stdlib/Libdl/ • Write the Julia wrappers. • Figure out the CI to build on all common platforms. • Figure out how to package. • Write docs. • Ship.
b

Brandon E.

01/20/2023, 1:35 PM
So all the clients are based on the C one?
a

Adam Cimarosti

01/20/2023, 1:36 PM
No. The Python one is. Go, Java, .NET etc are very slow when invoking C so we’ve made a decision to have full re-implementations.
Given that Julia can call C very fast (https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/) it would be the obvious approach for Julia. This way you get authentication, TLS and ongoing bugfixes and features for free.
b

Brandon E.

01/20/2023, 1:43 PM
Yeah it seems as an straightforward way to create the client for julia, i will start and i will let u know of my advances
a

Adam Cimarosti

01/20/2023, 1:44 PM
Neat: Reach out if you need any assistance!
b

Brandon E.

01/20/2023, 1:44 PM
Perfect, thank you @Adam Cimarosti 🙂
r

Roj Codeur

01/20/2023, 1:46 PM
A Julia client would be super awesome!
Sorry, I am not an expert programmer, tbh, would slow you down
a

Adam Cimarosti

01/20/2023, 1:54 PM
Cool. I’ll set up a
julia-questdb-client
repo and give you access to it.
What’s your GH id?
b

Brandon E.

01/20/2023, 1:55 PM
Ok thanks, it’s brandonescamilla
r

Roj Codeur

01/20/2023, 1:59 PM
Could I pick everyone’s thoughts on this as well pls https://github.com/questdb/roadmap/issues/10
I gather there are workarounds, but, is there a way to prioritise this pls?
a

Adam Cimarosti

01/20/2023, 2:21 PM
Not sure about that one. @javier ramirez, would you know?
j

javier ramirez

01/20/2023, 3:00 PM
Hey @Roj Codeur. I know it is on the radar of the core team and just a few days ago we commented about DELETE internally at the company chat, but as of today I don’t think it has a target date as the core team is working on WAL and replication as priorities. A good idea is to like/comment the issue on GitHub so it gets more attention
r

Roj Codeur

01/20/2023, 3:09 PM
Got it, thanks mate! I did comment on the issue as well. Ta!
a

Adam Cimarosti

01/20/2023, 3:25 PM
https://github.com/questdb/julia-questdb-client, @Brandon E., you should have a GitHub invite pending.