https://questdb.io logo
Title
t

Tony Wang

01/04/2023, 7:59 PM
The dream would be to use it in-process like DuckDB 🙂
a

Alex Pelagenko

01/04/2023, 8:02 PM
Interesting, why you don't use DuckDB for that?
t

Tony Wang

01/04/2023, 8:02 PM
asof join
duckdb recently added support for range join, but still no proper asof join support
and perhaps questdb's range join is faster
a

Alex Pelagenko

01/04/2023, 8:04 PM
Speed of asof join is largely derived from data being sorted by time, so it would be either slow query speed or rewrite penalty
On any road you take
t

Tony Wang

01/04/2023, 8:04 PM
does questdb not have assumption of sorted data?
a

Alex Pelagenko

01/04/2023, 8:04 PM
Heavily
t

Tony Wang

01/04/2023, 8:05 PM
afaik duckdb has no such assumption
hence why i believe questdb asof + range joins will be faster
a

Alex Pelagenko

01/04/2023, 8:05 PM
QuestDB writes data sorted by designated timestamp
So you have to import your arrow format to QuestDB and sort at that point or use another solution which perhaps will not rewrite the data but will be slower in asof join
n

Nicolas Hourcard

01/04/2023, 8:08 PM
Hi Tony, nice to meet you!
t

Tony Wang

01/04/2023, 8:08 PM
can i tell questdb my arrow data is sorted though
n

Nicolas Hourcard

01/04/2023, 8:08 PM
would be super cool to understand your use case a bit more, what kind of data are you dealing with?
t

Tony Wang

01/04/2023, 8:08 PM
because it is 🙂
Basically I am trying to use QuestDB's open source single node asof join to achieve a distributed asof join
currently I use polars asof join, I want to see if QuestDB's is faster
a

Alex Pelagenko

01/04/2023, 8:09 PM
Where do you store your arrow blobs?
t

Tony Wang

01/04/2023, 8:09 PM
they start from memory
a

Alex Pelagenko

01/04/2023, 8:11 PM
I thought when you say distributed it means distributed over network
t

Tony Wang

01/04/2023, 8:12 PM
yes -- the data starts from S3. After a series of transformations, each single node is going to have some arrow tables in memory that it has to do asof joins on
a

Alex Pelagenko

01/04/2023, 8:14 PM
Understood. Why not to stream directly to QuestDB instead of building arrow frame in memory?
t

Tony Wang

01/04/2023, 8:14 PM
say I have to run some Python UDFs on the data beforehand
like xgboost lol
a

Alex Pelagenko

01/04/2023, 8:16 PM
Fair enough. We are releasing a library to send pandas frames to Quest soon. No plans to plug in shared memory
t

Tony Wang

01/04/2023, 8:17 PM
that would be amazing
I think you should also just support arrow in memory format. I think it's becoming more mainstream and there is a conversion penalty to/from pandas
copies are fine
a

Alex Pelagenko

01/04/2023, 8:17 PM
Polars should be decent performance if they did not mess things up and use the fact that data is sorted
t

Tony Wang

01/04/2023, 8:18 PM
polars impl is not simd
I'd imagine you guys can do simd asof right
a

Alex Pelagenko

01/04/2023, 8:18 PM
No, I don't think so
t

Tony Wang

01/04/2023, 8:19 PM
hmmm ok. I'll check back in after you guys support the pandas -> Quest library then
a

Alex Pelagenko

01/04/2023, 8:20 PM
n

Nicolas Hourcard

01/05/2023, 9:21 AM
@Adam Cimarosti just released ingesting to ILP through pandas 😃
a

Adam Cimarosti

01/05/2023, 2:34 PM
Yes, I'm preparing a few things before a proper announcement.
It doesn't support
Arrow.Table
or Polars, but if this is something you'd need it should be pretty easy for me to add.
t

Tony Wang

01/09/2023, 4:42 PM
that would be great but it's not too hard to convert pandas to polars, though there is performance penalty