Title
#users-public
o

Orrin

11/24/2022, 4:04 AM
Hey gang—been a while. Curious to know how people solve 1-way synch a main instance of quest to satellite instances in the most reliable, efficient way. Any thoughts?
j

javier ramirez

11/24/2022, 9:39 AM
First I would like to say this is coming into QuestDB and should be released within the next few months. The primary instance will use WAL replication to send chunks to the satellites who will in turn apply the changes. This will be reliable and scalable. In the meantime, I would love to see here some answers from users doing this right now, but I can give you some alternatives that should work depending on your use case: • If you don’t need frequent synch and can afford to wait until partitions are finished, you can have an hourly/daily (depending on partition resolution) script to export from the main one and insert into the satellite using the REST API • If you need more frequent synch, and depending on the size of the dataset, you could do the same as before with the time-resolution you need, but deleting then overwriting the replica partition every time. Not ideal as there would be a timeframe in the replica where the whole partition has been deleted and no data would show. Depending on use case it might be good enough • For frequent synchronisation without the above issues, but only if your data is in order, you could run a query at the main server as frequently as needed. In this query you would filter rows since the last query. To do so, you could either use the latest timestamp (if timestamp resolution is small enough and you never get two events in the same absolute timestamp), or you could use row_number to get an incremental counter. You would get as a result only new rows since the last time, and you could just insert those into your replica. You could also keep a table in the replica database containing the latest seen timestamp/row_number so you would know the filter for the next iteration. Again, this works only for in order, append only data
9:41 AM
If none of those are suitable, we see many users using Kafka in front of QuestDB, so they can send their data to multiple destinations. In that case, you could take advantage of our Kafka Connect connector to send the data to multiple QuestDB instances. This is reliable and scales very well, but you need to manage a Kafka cluster, so unless you are already using one it can be a bit of a hassle
9:44 AM
And depending on how you are ingesting your data, you could also just send to multiple sources from your current ingestion pipeline. Let’s say you are using one of the official clients and sending data using the provided Sender object of the client. You could always open connections to both instances and send your data twice, once to each instance
9:44 AM
As I said, official built-in support for replication is coming, but in the meantime maybe one of those ideas will help you