Evening all - Happy Friday. In what situation would you recommend hourly partitions?
11/11/2022, 7:02 PM
do you currently face bottlenecks with out of order data on the ingestion side currently?
11/11/2022, 9:29 PM
Yeh pretty much. O3 is a bit of a killer. We are currently daily partitioned and doing any sort of seed of hourly data hits hard.
11/12/2022, 10:15 AM
yes - I would recommend to switch to hourly partitions
many of our customers have done this especially in market data use cases 👍
but we should also look at potentially tweaking the commit lag
do you guys ingest via ILP?
11/13/2022, 2:53 PM
Yeh it's over ILP. I've had a few variations on the commit lag but have gone back to defaults. To be honest, I think the use case isn't really ideal. A single publication during this seed could have 20k records made up of 24 per day for 4 years. The amplification rate was huge so I suspect it is rewriting the same daily file multiple times.
I will give the hourly partition a go, just want to see whether the read perf is impacted. Alternatively I will look at ingestion O3 using the csv import or copy.
11/14/2022, 1:44 PM
One other slight thing I've changed is the batching - I will now try and send only 5k records at a time. Previously I would send say, 10-20k and it would o3 - causing the publisher to then backup. Publisher would then try and send even more and progressively get worse. Batching in 5k seems to be "better"
Hm - seems I am on here with 2 different accounts!