Title
#users-public
j

Jack

11/11/2022, 6:13 PM
Evening all - Happy Friday. In what situation would you recommend hourly partitions?
Nicolas Hourcard

Nicolas Hourcard

11/11/2022, 7:02 PM
hey Jack
7:02 PM
do you currently face bottlenecks with out of order data on the ingestion side currently?
Jack Warnes

Jack Warnes

11/11/2022, 9:29 PM
Yeh pretty much. O3 is a bit of a killer. We are currently daily partitioned and doing any sort of seed of hourly data hits hard.
Nicolas Hourcard

Nicolas Hourcard

11/12/2022, 10:15 AM
yes - I would recommend to switch to hourly partitions
10:15 AM
many of our customers have done this especially in market data use cases 👍
10:15 AM
but we should also look at potentially tweaking the commit lag
10:16 AM
do you guys ingest via ILP?
Jack Warnes

Jack Warnes

11/13/2022, 2:53 PM
Yeh it's over ILP. I've had a few variations on the commit lag but have gone back to defaults. To be honest, I think the use case isn't really ideal. A single publication during this seed could have 20k records made up of 24 per day for 4 years. The amplification rate was huge so I suspect it is rewriting the same daily file multiple times.
2:54 PM
I will give the hourly partition a go, just want to see whether the read perf is impacted. Alternatively I will look at ingestion O3 using the csv import or copy.
j

Jack

11/14/2022, 1:44 PM
One other slight thing I've changed is the batching - I will now try and send only 5k records at a time. Previously I would send say, 10-20k and it would o3 - causing the publisher to then backup. Publisher would then try and send even more and progressively get worse. Batching in 5k seems to be "better"
1:44 PM
Hm - seems I am on here with 2 different accounts!
Nicolas Hourcard

Nicolas Hourcard

11/14/2022, 1:45 PM
thanks - let us know how it goes