https://questdb.io logo
Title
l

Lars Riis Olsen

01/28/2023, 11:16 AM
Hi everyone, We are currently evaluating QuestDb and I have a use case that I am unsure of whether can be handled efficiently in QuestDb. The setup is that we receive a new temperature forecast every 15 minutes. The forecast provides a value for the temperature at 10 minute intervals from the time the forecast was generated until 7 days after time of generation. So, the data being received is the first 3 columns:
ForecastGeneratedAt (timestamp)
TimestampForForecastedTemperature (timestamp)
ForecastedTemperature (float)
ForecastReceivedAt (timestamp)
And the ForecastReceivedAt is the at which we received it from the forecast provider. Case 1 is that at 15:00 every day we need to pick our best estimate for the next day. This has to be done by taking the most recently generated forecast we have available at that time (max(ForecastGeneratedAt) WHERE ForecastReceivedAt < 15:00) and taking a 24 hour slice from that forecast. Case 2 is that we also need to be able to do this operation efficiently for multiple days at one. For a year this would functionally be equivalent to doing a loop over 365 queries but that would obviously negatively impact performnce. The forecasts are being delivered for a large number of different geographical areas so while 1 forecast every 15 minutes does not sound like a lot the amount of rows is received every day is about 50 million. Especially for case 2 that adds up to a fairly significant rows that need to be processed. Currently we store this data in SqlServer but are looking to potentially replace it with something else. Any suggestions?
h

Holger

01/28/2023, 11:55 AM
I think they would fit nicely into one table when you add the geographic area as an indexed column. So is you question more about how to translate your business logic into questdb sql queries?
l

Lars Riis Olsen

01/28/2023, 12:56 PM
Yes. My question is about whether queries can be made that can handle the 2 cases efficiently given that is is essentially a subset of a series of timeseries (one forecast is essentially a short timesies) that need to be merged together
n

Nicolas Hourcard

01/30/2023, 9:32 AM
@Bolek Ziobrowski @Jaromir Hamala will be able to help!
b

Bolek Ziobrowski

01/30/2023, 9:38 AM
Hi @Lars Riis Olsen. It seems that you could do 1 with :
create table forecast 
(
 generatedAt timestamp,
 timestamp timestamp,
 temperature float,
 receivedAt timestamp
) timestamp(generatedAt)


explain 
select * 
from forecast 
where timestamp in '2023-01-30'
order by generatedAt desc 
limit 144
It seems to me that forecastGenerateAt is most important and it'd be enough to get the latest set of values and ignore forecastReceivedAt .
For next/last 7 days it should look the same and only differ in
where timestamp
predicate . Could you elaborate on Case 2 ?
It could be something like :
create table forecast 
(
 generatedAt timestamp,
 timestamp timestamp,
 temperature float,
 receivedAt timestamp,
 location symbol index
) timestamp(generatedAt) partition by day;

select * 
from forecast 
where timestamp in '2023-01-30' and location = 'X'
and generatedAt in '2023-01-29'
order by location, generatedAt desc
limit 144
Hey @Lars Riis Olsen @Lars Riis Olsen . Did you check out what I suggested ?
(seems like there's two slack account with the same name )