https://questdb.io logo
Title
l

Lars Riis Olsen

05/22/2023, 11:42 AM
Hi, I would like to calculate the spread between 2 instruments based on their trades. I would like to do it by calculating the 5min VWAP for each instrument and the joining the 2 series. Intuitively, I guess it should be something like the following but when I run it I get an 'QuestDB is not reachable [502]' error:
select firstleg_vwap - secondleg_vwap spread from (
select executedat firstleg_executedat, SUM(price*qty) / SUM(qty) firstleg_vwap from trade where instrument='[FIRST_LEG]'
SAMPLE by 5m
ALIGN TO CALENDAR
) splice join (
select executedat secondleg_executedat, SUM(price*qty) / SUM(qty) secondleg_vwap from trade where instrument='[SECOND_LEG]'
SAMPLE by 5m
ALIGN TO CALENDAR
)
What is the best way to do this?
n

Nicolas Hourcard

05/22/2023, 1:13 PM
hey Lars, let us look into it
b

Bolek Ziobrowski

05/22/2023, 2:13 PM
Hi Lars . Is splice join the thing you need ? Have you tried asof ?
Another approach could be :
select executedat, 
       coalesce(SUM( case when instrument = '[FIRST_LEG]' then price*qty else 0 end) / SUM( case when instrument = '[FIRST_LEG]' then qty else 0 end),0) -
       coalesce(SUM( case when instrument = '[SECOND_LEG]' then price*qty else 0 end) / SUM( case when instrument = '[SECOND_LEG]' then qty else 0 end),0) as spread
from trade 
where instrument in ('[FIRST_LEG]', '[SECOND_LEG]')
SAMPLE by 5m
ALIGN TO CALENDAR
or
select executedat, coalesce(max(case when instrument = '[FIRST_LEG]' then vwap else 0 end),0) - coalesce(max(case when instrument = '[SECOND_LEG]' then vwap else 0 end),0) as spread
from 
(
  select executedat, instrument, SUM( price*qty ) / SUM( qty) as vwap
  from trade 
  where instrument in ('[FIRST_LEG]', '[SECOND_LEG]')
  SAMPLE by 5m
  ALIGN TO CALENDAR
)
group by executedat
j

javier ramirez

05/22/2023, 2:56 PM
I was working on a similar query, and just tried Bolek’s approach compared to doing ASOF join. For one day worth of the public trades dataset the coalesce version performs over 3 times faster than the join one
These are both queries for comparison working on the demo.questdb.io dataset
with s1 as (
SELECT
_timestamp_,
sum(price * amount) / sum(amount) AS vwap_price
FROM trades
WHERE _timestamp_ IN '2023-05-21' and _symbol_ = 'ETH-USD' SAMPLE BY 15m ALIGN TO CALENDAR ),
s2 as (
SELECT
_timestamp_,
_symbol_,
sum(price * amount) / sum(amount) AS vwap_price
FROM trades
WHERE _timestamp_ IN '2023-05-21' and _symbol_ = 'BTC-USD' SAMPLE BY 15m ALIGN TO CALENDAR )
select s1._timestamp_, s1.vwap_price, s2.vwap_price from s1 asof join s2;
------------------------------------
select
_timestamp_,
coalesce(
SUM(
CASE
WHEN _SYMBOL_ = 'ETH-USD' then price * amount
ELSE 0
END
) / SUM(
CASE
WHEN _SYMBOL_ = 'ETH-USD' then amount
ELSE 0
END
),
0
),
coalesce(
SUM(
CASE
WHEN _SYMBOL_ = 'BTC-USD' then price * amount
ELSE 0
END
) / SUM(
CASE
WHEN _SYMBOL_ = 'BTC-USD' then amount
ELSE 0
END
),
0
)
FROM
trades
WHERE
_timestamp_ IN '2023-05-21'
SAMPLE BY 15m ALIGN TO CALENDAR
n

Nicolas Hourcard

05/22/2023, 3:03 PM
@Lars Riis Olsen to make sure you see this! thanks
l

Lars Riis Olsen

05/22/2023, 6:37 PM
Thank you for quick response. Both versions execute in roughly the same time on my dataset. The reason why I would like a splice join is that the instruments are sometimes rather illiquid which results in no trades being made on one of the instruments for the duration of the bucket. In those cases an asof join can cause me to miss a row. Ideally, I would like a full outer join on the bucket timestamp but as far as I can see full outer joins are not supported
j

javier ramirez

05/22/2023, 7:06 PM
The coalesce version should probably give you the right answer. If there's no instrument in one bucket but there's on the other, you would get a null value on the corresponding vwap, but the calculation for the interval in the other
b

Bolek Ziobrowski

05/22/2023, 8:27 PM
@Lars Riis Olsen The version with coalesce should should work fine . I'm not sure what kind of logic you wanted to use with splice join but it could produce surprising results because it can also match on nonequal timestamps .