• e

    Erfan Hemmati

    1 month ago
    Hello everyone How can I improve my data inserts via line protocol ? iam using Nodejs as client and I have more than 300 writes per 30 seconds. I see a Insert lag with line protocol. Also all these writes handled by one tcp connection
  • j

    javier ramirez

    1 month ago
    Hi Erfan. What do you mean by insert lag? Are you measuring from when you insert until you can see the data on your queries or is it something else?
  • Also, are your events ordered by timestamp or can they be out of order?
  • e

    Erfan Hemmati

    1 month ago
    Hi Javier we made a p2p crypto currency exchange and in the past we used timescaledb to storing historical prices and OHCL data. now I decided migrate to questdb and I wrote our structure and statement. also I used line protocol to insert my trades into questdb table. what is more my table contain market, amount, price and timestamp which in other hand I wrote query for getting OHCL data by any pair. The problem is that my inserts are not done instantly
  • j

    javier ramirez

    1 month ago
    This might be because of the commit lag configuration
  • There are some config params you can tune here, but specifically for your use case there is this one
    cairo.commit.lag
  • by default the commit lag is set to 300,000 miliseconds/5 minutes
  • That would mean in general your data will buffer for about 5 minutes before being committed, and before you can see them in your queries
  • If within those 5 minutes you accumulate more than the param at
    cairo.max.uncommitted.rows
    (defaults to 500K rows), the data will also be commited
  • and also if you stop sending data and the table is idle for longer than the param at
    <http://line.tcp.min.idle.ms|line.tcp.min.idle.ms>.before.writer.release
    (defaults to 30 seconds)
  • So basically you could just set your
    cairo.commit.lag
    to just a few seconds and that is the time it would take to see fresh data on your queries
  • e

    Erfan Hemmati

    1 month ago
    great! I will check this params immediately.
  • j

    javier ramirez

    1 month ago
    If your data is ordered, you can go low with that param
  • if your data is unsorted, then it is better to be conservative, as unsorted data means we need to rewrite data already committed to disk and it is costly, which is why we allow for this commit lag param to trade off between performance and freshness. You have more info at https://questdb.io/docs/guides/out-of-order-commit-lag
  • e

    Erfan Hemmati

    1 month ago
    Javier is there any recommended configuration for Increase operational capacity of inserts and select in questdb (in production)?
  • j

    javier ramirez

    1 month ago
    300 inserts in 30 seconds sounds “low” for QuestDB
  • so maybe just changing the commit lag value to 1 or 2 seconds works for you
  • e

    Erfan Hemmati

    1 month ago
    Aha
  • j

    javier ramirez

    1 month ago
    other than that, the partitioning strategy is important and you should choose a partition range that works for you (the default is DAY, which is probably fine)
  • e

    Erfan Hemmati

    1 month ago
    yes I choose that
  • j

    javier ramirez

    1 month ago
    and in more demanding systems, there are config values to configure how many cores on your box you want to dedicate only for parsing data you are ingesting, how many cores to dedicate only for writers, and how many shared cores for everything else
  • QuestDB will automatically choose sensible values depending on the number of available cores, but you could always fine tune
  • e

    Erfan Hemmati

    1 month ago
    What is your mean by ordered and unsorted data?
  • j

    javier ramirez

    1 month ago
    Unlike other databases (you mentioned timescale) QuestDB has been written from scratch to work with time series
  • To optimize for time series scenarios, we physically store the data on disk sorted by timestamp
  • So queries can be quite fast, as we combine partitioning, column-based storaged, and physically sorted rows (you can also add indexes and play with Symbols to get extra performance on other columns)
  • Your data might have an origin timestamp, or maybe you are ingesting without a timestamp and we are automatically assigning the ingestion timestamp to that row
  • e

    Erfan Hemmati

    1 month ago
    Nice
  • j

    javier ramirez

    1 month ago
    In the case of automatic timestamp, there is no concept of out of order
  • e

    Erfan Hemmati

    1 month ago
    I insert trades with own timestamp via line protocol
  • j

    javier ramirez

    1 month ago
    But in the case of your original timestamp, we might have out of order data. Imagine you are ingesting data from different sensors. Your QuestDB instance is based in, for example, London, and you have sensors streaming data both from Ireland and from Sydney
  • Because of network latency, events from Sydney will typically take longer to reach the server
  • e

    Erfan Hemmati

    1 month ago
    So which data has own timestamp called ordered?
  • j

    javier ramirez

    1 month ago
    so maybe you have an event that generated in Sydney, then another event 2 ms later from Ireland, but the one from Ireland reaches the database first
  • if we were committing every event as we get it, we would be inserting out of order data, but as I said we always store data ordered, so we would need to rearrange data on disk
  • the commit lag parameter specifies how often you want to commit, so we can “buffer” events in memory
  • in this case, even if events arrive unordered, when we commit from memory to disk we sort them and that is efficient
  • When you create a table, you define the designated timestamp for your table
  • e

    Erfan Hemmati

    1 month ago
    yes
  • j

    javier ramirez

    1 month ago
  • That’s the column that decides what is in order and what is not
  • so if your trading data can arrive to questdb out of order, it is interesting that you allow for a commit lag that takes that into account
  • e

    Erfan Hemmati

    1 month ago
    I think my trades should be ordered data
  • j

    javier ramirez

    1 month ago
    Then you can just set the commit lag param to 1 second and be happy 🙂
  • we will always be appending data to the end of the table, which is ideal
  • e

    Erfan Hemmati

    1 month ago
    Thank you Javier
  • I will apply the discussions that have been made in the project and announce the result
  • j

    javier ramirez

    1 month ago
    Looking forward to that!
  • e

    Erfan Hemmati

    1 month ago
    hi Javier @javier ramirez
  • QDB_SHARED_WORKER_COUNT=32 QDB_CAIRO_MAX_UNCOMMITTED_ROWS=1 QDB_LINE_TCP_MAINTENANCE_JOB_INTERVAL=1 QDB_LINE_TCP_IO_WORKER_COUNT=32 QDB_TCP_WRITER_WRITER_COUNT=32 does this variables effects our insert performance?
  • also i have another problem with the other process insert via pg wire protocol. i insert data into trades table with line protocol. in the other hand i have process which insert data from php to trades table. but insert via pg does not success.
  • Thanks for your reply
  • j

    javier ramirez

    1 month ago
    Your env variables look a bit weird to me
  • Regarding the CPU config, the defaults should be OK for many use cases, but if you want to fine-tune, take a look at https://questdb.io/docs/operations/capacity-planning
  • Also note one of your env variables has the wrong name
    QDB_TCP_WRITER_WRITER_COUNT
    doesn’t exist in QuestDB and I think you mean
    QDB_TCP_WRITER_WORKER_COUNT
  • The main thing about CPU config is that, unless you say otherwise, by default QuestDB will try to use one dedicated CPU only for writing, then the rest of the CPUs for everything else. On larger machines, it will also dedicate some cores to parse input data before handling to the writer
  • You probably don’t need more than a single CPU for writing. Chances are if you find your writes are slow, either you need a faster disk or you are experiencing out of order writes (which you said you will not have typically)
  • I would also not use uncommitted rows the way you are doing it. I prefer to set my commit lag to a value that makes sense for the business, then have a large number on uncommitted rows
  • imagine you are streaming 300 rows in one second, and you only need to select data once every 5 seconds
  • If you tell questdb to commit for every row, those are 1500 commits every five seconds
  • when you truly need only one for your business needs
  • In this scenario, I would configure a commit lag of five seconds, and a max uncommitted rows of a few thousands rows
  • The maintenance job interval you set would definitely cause a performance issue
  • message has been deleted
  • Basically the maintenance job interval says how often will QuestDB see if there are any idle tables that we need to flush/commit
  • setting it to 1 millisecond doesn’t make a lot of sense here, as we only want to check for idle tables
  • Regarding the issue that you are not having success with PG, not sure what that means. Are you getting any errors?
  • e

    Erfan Hemmati

    1 month ago
    this is my scenario. i will insert 500 records in every 30 seconds via nodejs (line protocol). also my data should be inserted to questdb immediately. after each successful insert i will select data for the other process. in the other hand i will insert some data from my php appliation via pg connection (interval and count of data depend on users trades). this is my server configuration: CPU: 16 core RAM: 32 G HARD: SSD SERVER: HETZNER CLOUD i want to do my inserts via line protocol immediately so i will set QDB_CAIRO_COMMIT_LAG=1000 variable. also i need questdb response my selects after insert and inserts process from my php application. with this scenario and my resources what other variables need to set for improve proformance?
  • j

    javier ramirez

    1 month ago
    I think just changing the commit lag should work 🙂
  • the rest of the defaults should be enough for your particular scenario
  • when inserting via pg protocol, be aware you might receive locking errors that you wouldn’t get on ILP. Your client application should be aware of that and retry on error
  • e

    Erfan Hemmati

    1 month ago
    This matter had occupied my mind
  • j

    javier ramirez

    1 month ago
    In the future the PG wire implementation we have should not have those issues, but as of today, you need to handle them
  • In any case, if that was a big problem for you, you can always send data from PHP using the ILP protocol
  • all you need is a socket connection and sending a UTF8 string with the right format
  • e

    Erfan Hemmati

    1 month ago
    so i shoud retry insert via pg from my client application until insert successful
  • j

    javier ramirez

    1 month ago
    Exactly
  • or switch to ingesting via ILP if that’s too problematic
  • this is an example of a valid string
    readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n
  • the first part is the table name, then all the symbols, then the rest of the columns, the last bit is the timestamp and you finish with \n
  • it is always a bit tricky to escape things properly, but if pg inserts are not working for you, you can always send over ILP from PHP
  • e

    Erfan Hemmati

    1 month ago
    can i define timestamp in line protocol like this '2022-07-19T08:56:08.372000Z' ?
  • j

    javier ramirez

    1 month ago
    no, you need to convert to a epoch in nanoseconds
  • e

    Erfan Hemmati

    1 month ago
    i think there is not any way to convert date time string into nanoseconds via php or mysql 🤔
  • j

    javier ramirez

    1 month ago
  • just take the microseconds value, multiply by a large number and avoid scientific notation by using bcmul
  • e

    Erfan Hemmati

    1 month ago
    ooo yes. i forgeted 😒miling_face_with_tear:
  • j

    javier ramirez

    1 month ago
    the time is not really the current time in nanoseconds
  • so the digits to the right will be always 0
  • still, it is a valid nanoseconds value and will work 🙂
  • e

    Erfan Hemmati

    1 month ago
    javier my inserts does not effects my selects? i
  • very nice. i will try it.
  • j

    javier ramirez

    1 month ago
    with 16 cores, you get a dedicated core for writing, and the readers will use the shared pool
  • selects will be independent of reads
  • e

    Erfan Hemmati

    1 month ago
    imagine my scenario so my client have an api which getiing OHCL data for trading view
  • did i nedd to set or increase shared pool variable? or by default it is setted?
  • message has been deleted
  • iam sorry my about questions 😃)
  • QDB_SHARED_WORKER_COUNT=5 this variable depends to shared pool?
  • j

    javier ramirez

    1 month ago
  • it is automatically set by questdb
  • as seen on that page
  • I am not good at PHP, but managed to ingest via ILP with this snippet
    #!//opt/homebrew/bin/php -q
    <?php
    error_reporting(E_ALL);
    /* Allow the script to hang around waiting for connections. */
    set_time_limit(0);
    /* Turn on implicit output flushing so we see what we're getting
    * as it comes in. */
    ob_implicit_flush();
    $address = 'localhost';
    $port = 9009;
    /* Create a TCP/IP socket. */
    $socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
    if ($socket === false) {
    echo "socket_create() failed: reason: " . socket_strerror(socket_last_error()) . "\n";
    } else {
    echo "OK.\n";
    }
    echo "Attempting to connect to '$address' on port '$port'...";
    $result = socket_connect($socket, $address, $port);
    if ($result === false) {
    echo "socket_connect() failed.\nReason: ($result) " . socket_strerror(socket_last_error($socket)) . "\n";
    } else {
    echo "OK.\n";
    }
    $row=utf8_encode("test_readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n");
    echo "$row";
    socket_write($socket, $row);
    echo "\n";
    socket_close($socket);
    ?>
  • e

    Erfan Hemmati

    1 month ago
    *javier* i am not closing/destroy tcp socket connection via nodejs is that ok?
  • const net               = require("net");
    const config            = require('../config/app');
    
    let ilpInstance;
    
    
    class QuestDbService {
        constructor() {
            this.ilp_host    = config.questdb.ilp.host;
            this.ilp_port    = parseInt(config.questdb.ilp.port);
        }
    
        async initIlp() {
            console.log('ILP line connection!');
            const client       = new net.Socket();
    
            this.ilpConnection = await client.connect(9009, '167.235.72.172');
    
            return this;
        }
    
        async insertToTradesTableWithIlp(attributes) {
            if (! this.ilpConnection) {
                await this.init();
            }
    
            await this.ilpConnection.write(`trades,market=${attributes.market},market_id=${attributes.marketId} price=${attributes.price},amount=${attributes.amount},buy_fake=${attributes.buy_fake},sell_fake=${attributes.sell_fake} ${attributes.timestamp}` + "\n");
        }
        
    }
    
    QuestDbService.getIlpInstance     = async () => {
        if (! ilpInstance) {
            let classInstance         = new QuestDbService()
            ilpInstance               = classInstance.initIlp();
        }
    
        return ilpInstance;
    };
    
    
    module.exports = QuestDbService;
  • usage
    let [ilpInstance,] = await Promise.all([
                    QuestDbService.getIlpInstance(),
                ]);
    
                await ilpInstance.insertToTradesTableWithIlp({
                    market              : data.baseCoinSymbol + data.quoteCoinSymbol,
                    marketId            : data.marketId,
                    price               : data.price,
                    amount              : data.amount,
                    buy_fake            : data.buy_fake,
                    sell_fake           : data.sell_fake,
                    timestamp           : data.timestamp
                });
  • j

    javier ramirez

    1 month ago
    I am not great at node, but in general you always want to close your sockets, as seen at https://questdb.io/docs/develop/insert-data#examples
  • We will be releasing an official nodejs client soon, by the way
  • e

    Erfan Hemmati

    1 month ago
    i your example first you open connection then write data after that closing the connection.
  • in this case every insert must be open a new connection
  • j

    javier ramirez

    1 month ago
    You can close at the end of the stream before exiting the script
  • in my case I am writing only a line 🙂
  • But you can stream as much as you want before closing
  • e

    Erfan Hemmati

    1 month ago
    means in the example 😃)
  • i use rabbitmq then i can dispatch data from php to same queue. in other hand nodejs will consume my message in insert via line protocol
  • thank you about your php example
  • javier i have another question about timestamp
  • does quest db automatic convert my nanoseconds timestamp to UTC timezone?
  • @javier ramirez
  • i inserted data with my timezone and in questdb console shown exactly as UTC
  • date time string in my timezone: 2022-07-19 15:04:05 nanosecond timestamp in my timezone: 1658226845000000000
  • j

    javier ramirez

    1 month ago
    the timestamp you send using ILP is nanoseconds using unix epoch, which starts at 1/1/1970T00:00:00Z
  • to represent in whichever TZ makes sense for your business
  • e

    Erfan Hemmati

    1 month ago
    yes i gotted. in our business trades data should be stored as utc timezone in questdb.
  • but date time in my mysql trades table base on my timzone.
  • i converted my date time string to nano seconds (whout changing timezone) and sent into questsb via line protocol. i stored is utc format. question is does questdb store timestamps base on utc timezone?
  • ؟
  • j

    javier ramirez

    1 month ago
    if you have your data on mysql, and if you don’t need resolution smaller than one second, the easiest way is to just apply the UNIX_TIMESTAMP function directly on your mysql query https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_unix-timestamp
  • that will give you the right epoch in seconds converting from the timezone on your data
  • then you just multiply that by 1000000000 to get nanoseconds
  • and that’s the timestamp you need to pass via ILP
  • e

    Erfan Hemmati

    1 month ago
    @javier ramirez thank you i got all of things that i want.
  • j

    javier ramirez

    1 month ago
    glad to hear 🙂
  • e

    Erfan Hemmati

    4 weeks ago
    @javier ramirez questdb does not have sample by week?
  • or can we use SAMPLE BY 7d ALIGN TO CALENDAR; ?
  • j

    javier ramirez

    4 weeks ago
    No week sampling right now. I believe when you sample by 7d align to calendar you will get aggregated data from 7 days on each row, but the day will start actually from the 1st day in your query. So if your dataset starts on a wednesday, you would get samples of 7 days from wednesday to tuesday, which is not what you want (I think). But I am not 100% sure. Maybe give it a try and see what you get 🙂
  • This was requested recently and there is an issue to track interest https://stackoverflow.com/questions/72874934/questdb-partition-by-week
  • e

    Erfan Hemmati

    4 weeks ago
    thank you very much
  • with volume_real as (select sum(amount * price) as volume from 'trades'
      where market_id = 73 and (buy_fake = false or sell_fake = false) and timestamp between '2022-07-18T14:55:12.135Z' and '2022-07-19T14:55:12.135Z')
    
    
      select
        min(price) as min_price,
        max(price) as max_price,
        first(price) as first_price,
        last(price) as last_price
        from 'trades'
      where market_id = 73 and timestamp between '2022-07-18T14:55:12.135Z' and '2022-07-19T14:55:12.135Z';
  • how can i excecute sub query and get my real_volume is a result column in the main result? @javier ramirez thankful for your responding
  • im mysql or others it is very easy 🙂
  • i want get data within one query
  • j

    javier ramirez

    4 weeks ago
    You can just do a join between those two queries
  • since you have only one record on each you can do a cross join limit 1
  • or else you could add market_id as a select column on both queries and do a join on market_id
  • e

    Erfan Hemmati

    4 weeks ago
    great
  • i solved this problem by cross join
  • i have another problem @javier ramirez
  • i have multiple select via PG after my data insertion to the same table (trades).
  • i use node js pg library and create a pool connection with limit 5
  • but sometimes i see that server closed my connection
  • such as
  • message has been deleted
  • is there any way to resolve this? for example increasing pool connection limit or server configuration or pg wire configuration on the main server?
  • using rest api can improve my performace? @javier ramirez
  • j

    javier ramirez

    4 weeks ago
    I am not an expert on low level connections. I know there are a bunch of parameters at https://questdb.io/docs/reference/configuration#postgres-wire-protocol, but I haven’t played with those much
  • e

    Erfan Hemmati

    4 weeks ago
    great!
  • @javier ramirez ILP keep data in memory then insert into table?
  • j

    javier ramirez

    4 weeks ago
    It keeps data in memory for as long as the commitLag parameter dictates
  • then writes to disk
  • This will change in a few months, when we release write ahead log functionality as a first step to implement replication
  • e

    Erfan Hemmati

    4 weeks ago
    Imagine my commit lag is 5 second in this period I will write data with ILP then suddenly my container is restarted immediately before reach to commit lag. So I will lose my data? @javier ramirez
  • j

    javier ramirez

    4 weeks ago
    Yes. A workaround I can think of is simulating a snapshot
  • When you issue a SNAPSHOT PREPARE statement, the db will flush data. You can then stop sending data, and issue a SNAPSHOT COMPLETE immediately afterwards
  • not sure if this is workable for you
  • And I am not even sure if that would work at all
  • In any case, with the WAL functionality I mentioned, this won’t be an issue anymore
  • e

    Erfan Hemmati

    4 weeks ago
    🤔🤔
  • @javier ramirez I think questdb lock table in the select. I have a lot of inserts to trades table. then system does not respond for my selects in parallel mode
  • parallel insert and selects in same table with different data
  • Does selects in questdb consume memory resources?
  • Thank you for your help
  • j

    javier ramirez

    4 weeks ago
    Select shouldn’t lock a table. Also, unless you are using a very small machine, writes and reads should be independently handled by different CPU cores. For a large enough instance, you should have a dedicated writer and the reads would come from the shared pool
  • If you don’t have enough cores to have a dedicated writer, then writes and reads will compete
  • e

    Erfan Hemmati

    4 weeks ago
    I have 32 core from hetzner colud
  • you said by default reads/selects use shared pool. I am correct?
  • j

    javier ramirez

    4 weeks ago
    yes, reads always use the shared pool. It is the writes you can configure
  • e

    Erfan Hemmati

    4 weeks ago
    with io_shared_worker?
  • j

    javier ramirez

    4 weeks ago
  • e

    Erfan Hemmati

    4 weeks ago
    Unlike MySQL and etc questdb consume memory for selects?
  • j

    javier ramirez

    4 weeks ago
    Maybe ask about your performance issue on the main channel (out of this thread) so someone else can help you. I am good with high level details, but not great with low level implementation 🙂
  • sorry!
  • e

    Erfan Hemmati

    4 weeks ago
    @javier ramirez i gotted.
  • @javier ramirez if i open a single connection from my application to the server questdb use shared pool. it means i have been not open 1 single connection im i correct?
  • j

    javier ramirez

    4 weeks ago
    shared workers makes reference to how many worker threads will be shared for multiple functions (including running SQL queries). It is not a connection pool, but a pool to decide which thread is doing what