https://questdb.io logo
Title
c

cosmin

01/14/2023, 11:02 PM
I need some scaling advice. This is my current setup; aws c5.xlarge - 4vcpu 8gb ram questdb AMI built. Cryptofeed connector running on same local ec2 machine as questdb, as a docker container to send data to questdb. In the first day, the data timestamp to receipt timestamp had a latency of 200ms or so, now after 100m+ ingests after 2 days the latency is around 4 seconds.. (I created a fresh questdb ec2 instance with exact same configs and the latency was fine around 200ms - this is to verify that indeed ingests get slower as database is bigger) Do i need to implement any sort of setting to restore performance? Or upgrade hardware to like 8vcpu? Also, would running cryptofeed on the same machine substantially affect performance?
i

Imre

01/16/2023, 1:27 AM
Hi @cosmin, both receipt and data timestamps are calculated in cryptofeed before they are sent to QuestDB.
async def write(self, data):
        d = self.format(data)
        timestamp = data["timestamp"]
        received_timestamp_int = int(data["receipt_timestamp"] * 1_000_000)
        timestamp_int = int(timestamp * 1_000_000_000) if timestamp is not None else received_timestamp_int * 1000
        update = f'{self.key}-{data["exchange"]},symbol={data["symbol"]} {d},receipt_timestamp={received_timestamp_int}t {timestamp_int}'
        await self.queue.put(update)
Looks like cryptofeed or the connection to the exchange gets slower by time.
a

Adam Cimarosti

01/16/2023, 7:46 AM
Hi @cosmin, Something I've noticed: whilst a correctness and not a performance problem I highly suggest that you consider using the Python client. At the moment it doesn't look like you are escaping your strings. If you prefer to do your own networking for some reason then you can just use the
questdb.ingress.Buffer
object on its own.
i

Imre

01/16/2023, 9:57 AM
i think @cosmin is a user of cryptofeed. https://github.com/bmoscon/cryptofeed but agree, the project should change to the client
c

cosmin

01/16/2023, 10:55 AM
Oh I see the receipt timestamp is calculated by cryptofeed. Something interesting - the latency now seems to be quite fine, under 100ms. It seems the latency is high only when there is a lot of websocket data coming in. I guess this comes down to my deployment of cryptofeed? Should try to deploy in a separate ec2?
i

Imre

01/16/2023, 12:06 PM
hi @cosmin, had a quick look at cryptofeed code. you could try to switch
multiprocessing
on. it might helps, assuming you are subscribed to multiple market data feeds.
multiprocessing
is off by default (backend.py):
class BackendQueue:
    def start(self, loop: asyncio.AbstractEventLoop, multiprocess=False):
        if hasattr(self, 'started') and self.started:
            # prevent a backend callback from starting more than 1 writer and creating more than 1 queue
            return
        self.multiprocess = multiprocess
        if self.multiprocess:
            self.queue = Pipe(duplex=False)
            self.worker = Process(target=BackendQueue.worker, args=(self.writer,), daemon=True)
            self.worker.start()
        else:
            self.queue = Queue()
            self.worker = loop.create_task(self.writer())
        self.started = True
moving
cryptofeed
to another instance might be a good idea too, especially if there are no free cores for the exchange connections. it depends on the number of processes will be started by cryptofeed, in other words the number of data feeds.
if you move to another instance, remember to change the host for the QuestDB connection. it is localhost by default (quest.py):
class QuestCallback(SocketCallback):
    def __init__(self, host='127.0.0.1', port=9009, key=None, **kwargs):
        super().__init__(f"tcp://{host}", port=port, **kwargs)
i did not dig deep into this but there must be a way to pass these parameters (QuestDB hostname, multiprocessing flag) to cryptofeed.
c

cosmin

01/16/2023, 10:26 PM
Thank you so much for the feedback guys. This has been very very helpful.