Lars Riis Olsen
01/28/2023, 11:16 AMForecastGeneratedAt (timestamp)
TimestampForForecastedTemperature (timestamp)
ForecastedTemperature (float)
ForecastReceivedAt (timestamp)
And the ForecastReceivedAt is the at which we received it from the forecast provider.
Case 1 is that at 15:00 every day we need to pick our best estimate for the next day. This has to be done by taking the most recently generated forecast we have available at that time (max(ForecastGeneratedAt) WHERE ForecastReceivedAt < 15:00) and taking a 24 hour slice from that forecast.
Case 2 is that we also need to be able to do this operation efficiently for multiple days at one. For a year this would functionally be equivalent to doing a loop over 365 queries but that would obviously negatively impact performnce.
The forecasts are being delivered for a large number of different geographical areas so while 1 forecast every 15 minutes does not sound like a lot the amount of rows is received every day is about 50 million. Especially for case 2 that adds up to a fairly significant rows that need to be processed.
Currently we store this data in SqlServer but are looking to potentially replace it with something else.
Any suggestions?Holger
01/28/2023, 11:55 AMLars Riis Olsen
01/28/2023, 12:56 PMNicolas Hourcard
01/30/2023, 9:32 AMBolek Ziobrowski
01/30/2023, 9:38 AMcreate table forecast
(
generatedAt timestamp,
timestamp timestamp,
temperature float,
receivedAt timestamp
) timestamp(generatedAt)
explain
select *
from forecast
where timestamp in '2023-01-30'
order by generatedAt desc
limit 144
It seems to me that forecastGenerateAt is most important and it'd be enough to get the latest set of values and ignore forecastReceivedAt .where timestamp
predicate .
Could you elaborate on Case 2 ?create table forecast
(
generatedAt timestamp,
timestamp timestamp,
temperature float,
receivedAt timestamp,
location symbol index
) timestamp(generatedAt) partition by day;
select *
from forecast
where timestamp in '2023-01-30' and location = 'X'
and generatedAt in '2023-01-29'
order by location, generatedAt desc
limit 144