https://questdb.io logo
Title
s

Super Richman

04/13/2023, 9:07 AM
on lt join, how can I instruct qdb to retreive data only if its on the same day? (lt join that will result in data from prev day will have nulls instead)
j

Jaromir Hamala

04/13/2023, 9:11 AM
hi, you mean the same day as when you are running the query?
s

Super Richman

04/13/2023, 9:19 AM
no, in timestamp column. for example it the first minute in the date is 08:00:00 then the LT JOIN generated columns will have null in them, but when the timestamp is 08:01:00 then it will start populating the LT JOIN cols
so the first minute will not have the data from yesterday for example 23:59:00
j

Jaromir Hamala

04/13/2023, 9:28 AM
ok, so like the LT JOIN, but with further restriction that 2 records will be joined if and only if the timestamps are on the same day (in UTC).
s

Super Richman

04/13/2023, 9:29 AM
something like that yes
or it can be lt join restricted by
PARTITION BY
since my
PARTITION BY
is DAY
j

Jaromir Hamala

04/13/2023, 9:53 AM
I cannot think of anything particularly elegant. you can join and then use case/if/else to filter out the results you do not want. but that’s not great. but I am not best with SQL, perhaps someone else will have a better idea.
j

javier ramirez

04/13/2023, 10:45 AM
Maybe this?
with minitrades AS (select date_trunc('day', timestamp) as theday, * from trades 
where timestamp between '2023-04-12T23:59:59' AND '2023-04-13T00:00:01'
)
select * from minitrades a LT join minitrades b ON (theday)
you can test on the demo site https://demo.questdb.io/
As documented, you can do an LT or ASOF Join using the timestamp and an optional second column https://questdb.io/docs/reference/sql/join/#using-on-for-matching-column-value
In this case what I am doing is truncating the timestamp at day level as a new column (theday) and then using ON to match only if that column matches
The query above is filtering just a few seconds before/after last midnight to illustrate how it works
To make it a bit more fun/illustrative, here is the same query analyzing what happens from 23:59:29 and two seconds afterwards in the last seven days, so you can see what happens every day around midnight
with minitrades AS (select date_trunc('day', timestamp) as theday, * from trades 
where timestamp IN '2023-04-12T23:59:59;2s;-1d;7'
)
select * from minitrades a LT join minitrades b ON (theday)
s

Super Richman

04/13/2023, 2:23 PM
Is there a way to tell QDB to not return lines that have null in them?
j

javier ramirez

04/13/2023, 2:24 PM
do you mean for the query above?
s

Super Richman

04/13/2023, 2:24 PM
oh sorry I sent this message before I saw your messages
checking...
hmm I am already using the ON word, is it possible to use it twice?
what about
coalesce
can it help?
j

javier ramirez

04/13/2023, 2:30 PM
you cannot use ON twice, but you could concat the current day plus the column you are using as the ON at the moment and that should work
let me rewrite the query above
with minitrades AS (select concat(symbol, date_trunc('day', timestamp)) as symbol_plus_day, * from trades 
where timestamp IN '2023-04-12T23:59:59;2s;-1d;7'
)
select * from minitrades a LT join minitrades b ON (symbol_plus_day)
In the query now I want to do an LT for the trades, but in the context of the symbol column, so I would need to use LT JOIN ON (symbol). But I also want to add your daily constraint, so I am concatenating them, and using that one as the ON column
would that work for your use case?
s

Super Richman

04/13/2023, 2:37 PM
wow this is crazy... you are sql 1337 ;]
it might work but I think it overcomplicates my usecase, I have found a workaround for now:
SELECT * FROM (with minitrades AS (select date_trunc('day', timestamp) as theday, * from trades 
where timestamp IN '2023-04-12T23:59:59;2s;-1d;7'
)
select * from minitrades a LT join minitrades b ON symbol)
WHERE timestamp1 IS NOT NULL
what do you think?
I will save the queries you sent here and learn them a bit and probably will use them for something else
j

javier ramirez

04/13/2023, 2:39 PM
That works for sure. It’d be interesting to see the EXPLAIN for both and maybe test what performs better, adding the extra concat for every row, or avoiding the second filter
s

Super Richman

04/13/2023, 2:39 PM
oh no my select is not enough 😕 since it does not seperates the days
j

javier ramirez

04/13/2023, 2:40 PM
ahhh.. yes.. I can see that. I was thinking on my initial one
if you need to separate on both, you need to concat
s

Super Richman

04/13/2023, 2:42 PM
hmmm, ok I will start investigating
Thanks :]
my main issue is that this query is built dynamically and I am trying to keep it as simple as possible since I am doing numerous LT JOINs together..
ok I think I got it
SELECT * FROM 
((SELECT * from trades where timestamp between '2023-04-12T23:59:59' AND '2023-04-13T00:00:01') LT join trades ON symbol)
WHERE date_trunc('day', timestamp)=date_trunc('day', timestamp1)
thanks Javier for your help!
j

javier ramirez

04/13/2023, 2:57 PM
yes, but in that case you are not getting the first rows of every day. Not sure if that’s what you are after
s

Super Richman

04/13/2023, 2:58 PM
ye I am skipping the rows in the amount of LT JOINs I am doing
for minute based market data wouldn't you do it?
j

javier ramirez

04/13/2023, 3:08 PM
Every use case is different. Whatever makes sense for you. Just wanted to make sure as your version returns 25 rows and mine returns 32 for that date range. Every row (or several if they share the same timestamp) with the earliest timestamp every day and for each value on the ON column will be skipped
s

Super Richman

04/13/2023, 3:21 PM
but the query you wrote with the 32 lines is the query that you join on the day only, I join on symbol
isn't that matters?
j

javier ramirez

04/13/2023, 3:24 PM
I am joining on both. This is the query I am running
with minitrades AS (select concat(symbol, date_trunc('day', timestamp)) as symbol_plus_day, * from trades 
where timestamp between '2023-04-12T23:59:59' AND '2023-04-13T00:00:01'
)
select * from minitrades a LT join minitrades b ON (symbol_plus_day)
Now if I run your own query commenting out the final where, we also get 32 rows, as in
SELECT * FROM 
((SELECT * from trades where timestamp between '2023-04-12T23:59:59' AND '2023-04-13T00:00:01') LT join trades ON symbol)
 -- WHERE date_trunc('day', timestamp)=date_trunc('day', timestamp1)
this is giving you a row for each row in the trades table between both timestamps
then your where is deleting all the rows for every day where we are not matching with a previous row. So for every day and every symbol you will be deleting at least one row
if that’s what you need, then that’s great 🙂
otherwise, not so much
s

Super Richman

04/13/2023, 3:26 PM
from a quick check - 8 rows that have nulls in them are removed = 25 rows returned
but I didnt look into the data...
j

javier ramirez

04/13/2023, 3:29 PM
exactly. That might or might not be the behaviour you are after. Both queries work. They return different things because they are doing slightly different things. My version behaves as an outer join and will return a row for every row present in the first table, even when there is not a matching condition. Your version will return a row only as long as there is a previous row for that same day and the same symbol
s

Super Richman

04/13/2023, 4:17 PM
ok yes so this is good :]