Title
#users-public
e

Erfan Hemmati

07/18/2022, 8:25 AM
Hello everyone How can I improve my data inserts via line protocol ? iam using Nodejs as client and I have more than 300 writes per 30 seconds. I see a Insert lag with line protocol. Also all these writes handled by one tcp connection
j

javier ramirez

07/18/2022, 8:29 AM
Hi Erfan. What do you mean by insert lag? Are you measuring from when you insert until you can see the data on your queries or is it something else?
8:30 AM
Also, are your events ordered by timestamp or can they be out of order?
e

Erfan Hemmati

07/18/2022, 8:47 AM
Hi Javier we made a p2p crypto currency exchange and in the past we used timescaledb to storing historical prices and OHCL data. now I decided migrate to questdb and I wrote our structure and statement. also I used line protocol to insert my trades into questdb table. what is more my table contain market, amount, price and timestamp which in other hand I wrote query for getting OHCL data by any pair. The problem is that my inserts are not done instantly
j

javier ramirez

07/18/2022, 8:48 AM
This might be because of the commit lag configuration
8:50 AM
There are some config params you can tune here, but specifically for your use case there is this one
cairo.commit.lag
8:50 AM
by default the commit lag is set to 300,000 miliseconds/5 minutes
8:51 AM
That would mean in general your data will buffer for about 5 minutes before being committed, and before you can see them in your queries
8:52 AM
If within those 5 minutes you accumulate more than the param at
cairo.max.uncommitted.rows
(defaults to 500K rows), the data will also be commited
8:54 AM
and also if you stop sending data and the table is idle for longer than the param at
<http://line.tcp.min.idle.ms|line.tcp.min.idle.ms>.before.writer.release
(defaults to 30 seconds)
8:55 AM
So basically you could just set your
cairo.commit.lag
to just a few seconds and that is the time it would take to see fresh data on your queries
e

Erfan Hemmati

07/18/2022, 8:55 AM
great! I will check this params immediately.
j

javier ramirez

07/18/2022, 8:55 AM
If your data is ordered, you can go low with that param
8:56 AM
if your data is unsorted, then it is better to be conservative, as unsorted data means we need to rewrite data already committed to disk and it is costly, which is why we allow for this commit lag param to trade off between performance and freshness. You have more info at https://questdb.io/docs/guides/out-of-order-commit-lag
e

Erfan Hemmati

07/18/2022, 8:57 AM
Javier is there any recommended configuration for Increase operational capacity of inserts and select in questdb (in production)?
j

javier ramirez

07/18/2022, 8:57 AM
300 inserts in 30 seconds sounds “low” for QuestDB
8:57 AM
so maybe just changing the commit lag value to 1 or 2 seconds works for you
e

Erfan Hemmati

07/18/2022, 8:58 AM
Aha
j

javier ramirez

07/18/2022, 8:58 AM
other than that, the partitioning strategy is important and you should choose a partition range that works for you (the default is DAY, which is probably fine)
e

Erfan Hemmati

07/18/2022, 8:59 AM
yes I choose that
j

javier ramirez

07/18/2022, 8:59 AM
and in more demanding systems, there are config values to configure how many cores on your box you want to dedicate only for parsing data you are ingesting, how many cores to dedicate only for writers, and how many shared cores for everything else
8:59 AM
QuestDB will automatically choose sensible values depending on the number of available cores, but you could always fine tune
e

Erfan Hemmati

07/18/2022, 9:00 AM
What is your mean by ordered and unsorted data?
j

javier ramirez

07/18/2022, 9:01 AM
Unlike other databases (you mentioned timescale) QuestDB has been written from scratch to work with time series
9:01 AM
To optimize for time series scenarios, we physically store the data on disk sorted by timestamp
9:02 AM
So queries can be quite fast, as we combine partitioning, column-based storaged, and physically sorted rows (you can also add indexes and play with Symbols to get extra performance on other columns)
9:02 AM
Your data might have an origin timestamp, or maybe you are ingesting without a timestamp and we are automatically assigning the ingestion timestamp to that row
e

Erfan Hemmati

07/18/2022, 9:03 AM
Nice
j

javier ramirez

07/18/2022, 9:03 AM
In the case of automatic timestamp, there is no concept of out of order
e

Erfan Hemmati

07/18/2022, 9:04 AM
I insert trades with own timestamp via line protocol
j

javier ramirez

07/18/2022, 9:04 AM
But in the case of your original timestamp, we might have out of order data. Imagine you are ingesting data from different sensors. Your QuestDB instance is based in, for example, London, and you have sensors streaming data both from Ireland and from Sydney
9:05 AM
Because of network latency, events from Sydney will typically take longer to reach the server
e

Erfan Hemmati

07/18/2022, 9:05 AM
So which data has own timestamp called ordered?
j

javier ramirez

07/18/2022, 9:05 AM
so maybe you have an event that generated in Sydney, then another event 2 ms later from Ireland, but the one from Ireland reaches the database first
9:06 AM
if we were committing every event as we get it, we would be inserting out of order data, but as I said we always store data ordered, so we would need to rearrange data on disk
9:06 AM
the commit lag parameter specifies how often you want to commit, so we can “buffer” events in memory
9:06 AM
in this case, even if events arrive unordered, when we commit from memory to disk we sort them and that is efficient
9:07 AM
When you create a table, you define the designated timestamp for your table
e

Erfan Hemmati

07/18/2022, 9:07 AM
yes
9:07 AM
That’s the column that decides what is in order and what is not
9:08 AM
so if your trading data can arrive to questdb out of order, it is interesting that you allow for a commit lag that takes that into account
e

Erfan Hemmati

07/18/2022, 9:08 AM
I think my trades should be ordered data
j

javier ramirez

07/18/2022, 9:08 AM
Then you can just set the commit lag param to 1 second and be happy 🙂
9:08 AM
we will always be appending data to the end of the table, which is ideal
e

Erfan Hemmati

07/18/2022, 9:11 AM
Thank you Javier
9:11 AM
I will apply the discussions that have been made in the project and announce the result
j

javier ramirez

07/18/2022, 9:14 AM
Looking forward to that!
e

Erfan Hemmati

07/19/2022, 7:04 AM
hi Javier @javier ramirez
7:06 AM
QDB_SHARED_WORKER_COUNT=32 QDB_CAIRO_MAX_UNCOMMITTED_ROWS=1 QDB_LINE_TCP_MAINTENANCE_JOB_INTERVAL=1 QDB_LINE_TCP_IO_WORKER_COUNT=32 QDB_TCP_WRITER_WRITER_COUNT=32 does this variables effects our insert performance?
7:09 AM
also i have another problem with the other process insert via pg wire protocol. i insert data into trades table with line protocol. in the other hand i have process which insert data from php to trades table. but insert via pg does not success.
7:32 AM
Thanks for your reply
j

javier ramirez

07/19/2022, 8:07 AM
Your env variables look a bit weird to me
8:08 AM
Regarding the CPU config, the defaults should be OK for many use cases, but if you want to fine-tune, take a look at https://questdb.io/docs/operations/capacity-planning
8:10 AM
Also note one of your env variables has the wrong name
QDB_TCP_WRITER_WRITER_COUNT
doesn’t exist in QuestDB and I think you mean
QDB_TCP_WRITER_WORKER_COUNT
8:12 AM
The main thing about CPU config is that, unless you say otherwise, by default QuestDB will try to use one dedicated CPU only for writing, then the rest of the CPUs for everything else. On larger machines, it will also dedicate some cores to parse input data before handling to the writer
8:13 AM
You probably don’t need more than a single CPU for writing. Chances are if you find your writes are slow, either you need a faster disk or you are experiencing out of order writes (which you said you will not have typically)
8:14 AM
I would also not use uncommitted rows the way you are doing it. I prefer to set my commit lag to a value that makes sense for the business, then have a large number on uncommitted rows
8:15 AM
imagine you are streaming 300 rows in one second, and you only need to select data once every 5 seconds
8:15 AM
If you tell questdb to commit for every row, those are 1500 commits every five seconds
8:15 AM
when you truly need only one for your business needs
8:16 AM
In this scenario, I would configure a commit lag of five seconds, and a max uncommitted rows of a few thousands rows
8:17 AM
The maintenance job interval you set would definitely cause a performance issue
8:17 AM
8:18 AM
Basically the maintenance job interval says how often will QuestDB see if there are any idle tables that we need to flush/commit
8:18 AM
setting it to 1 millisecond doesn’t make a lot of sense here, as we only want to check for idle tables
8:19 AM
Regarding the issue that you are not having success with PG, not sure what that means. Are you getting any errors?
e

Erfan Hemmati

07/19/2022, 9:17 AM
this is my scenario. i will insert 500 records in every 30 seconds via nodejs (line protocol). also my data should be inserted to questdb immediately. after each successful insert i will select data for the other process. in the other hand i will insert some data from my php appliation via pg connection (interval and count of data depend on users trades). this is my server configuration: CPU: 16 core RAM: 32 G HARD: SSD SERVER: HETZNER CLOUD i want to do my inserts via line protocol immediately so i will set QDB_CAIRO_COMMIT_LAG=1000 variable. also i need questdb response my selects after insert and inserts process from my php application. with this scenario and my resources what other variables need to set for improve proformance?
j

javier ramirez

07/19/2022, 9:18 AM
I think just changing the commit lag should work 🙂
9:18 AM
the rest of the defaults should be enough for your particular scenario
9:20 AM
when inserting via pg protocol, be aware you might receive locking errors that you wouldn’t get on ILP. Your client application should be aware of that and retry on error
e

Erfan Hemmati

07/19/2022, 9:20 AM
This matter had occupied my mind
j

javier ramirez

07/19/2022, 9:21 AM
In the future the PG wire implementation we have should not have those issues, but as of today, you need to handle them
9:22 AM
In any case, if that was a big problem for you, you can always send data from PHP using the ILP protocol
9:22 AM
all you need is a socket connection and sending a UTF8 string with the right format
e

Erfan Hemmati

07/19/2022, 9:23 AM
so i shoud retry insert via pg from my client application until insert successful
j

javier ramirez

07/19/2022, 9:23 AM
Exactly
9:23 AM
or switch to ingesting via ILP if that’s too problematic
9:24 AM
this is an example of a valid string
readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n
9:25 AM
the first part is the table name, then all the symbols, then the rest of the columns, the last bit is the timestamp and you finish with \n
9:25 AM
it is always a bit tricky to escape things properly, but if pg inserts are not working for you, you can always send over ILP from PHP
e

Erfan Hemmati

07/19/2022, 9:26 AM
can i define timestamp in line protocol like this '2022-07-19T08:56:08.372000Z' ?
j

javier ramirez

07/19/2022, 9:27 AM
no, you need to convert to a epoch in nanoseconds
e

Erfan Hemmati

07/19/2022, 9:34 AM
i think there is not any way to convert date time string into nanoseconds via php or mysql 🤔
j

javier ramirez

07/19/2022, 9:34 AM
9:35 AM
just take the microseconds value, multiply by a large number and avoid scientific notation by using bcmul
e

Erfan Hemmati

07/19/2022, 9:36 AM
ooo yes. i forgeted 😒miling_face_with_tear:
j

javier ramirez

07/19/2022, 9:36 AM
the time is not really the current time in nanoseconds
9:37 AM
so the digits to the right will be always 0
9:37 AM
still, it is a valid nanoseconds value and will work 🙂
e

Erfan Hemmati

07/19/2022, 9:37 AM
javier my inserts does not effects my selects? i
9:37 AM
very nice. i will try it.
j

javier ramirez

07/19/2022, 9:38 AM
with 16 cores, you get a dedicated core for writing, and the readers will use the shared pool
9:38 AM
selects will be independent of reads
e

Erfan Hemmati

07/19/2022, 9:39 AM
imagine my scenario so my client have an api which getiing OHCL data for trading view
9:40 AM
did i nedd to set or increase shared pool variable? or by default it is setted?
9:41 AM
9:43 AM
iam sorry my about questions 😃)
9:45 AM
QDB_SHARED_WORKER_COUNT=5 this variable depends to shared pool?
9:54 AM
it is automatically set by questdb
9:54 AM
as seen on that page
10:06 AM
I am not good at PHP, but managed to ingest via ILP with this snippet
#!//opt/homebrew/bin/php -q
<?php
error_reporting(E_ALL);
/* Allow the script to hang around waiting for connections. */
set_time_limit(0);
/* Turn on implicit output flushing so we see what we're getting
* as it comes in. */
ob_implicit_flush();
$address = 'localhost';
$port = 9009;
/* Create a TCP/IP socket. */
$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
if ($socket === false) {
echo "socket_create() failed: reason: " . socket_strerror(socket_last_error()) . "\n";
} else {
echo "OK.\n";
}
echo "Attempting to connect to '$address' on port '$port'...";
$result = socket_connect($socket, $address, $port);
if ($result === false) {
echo "socket_connect() failed.\nReason: ($result) " . socket_strerror(socket_last_error($socket)) . "\n";
} else {
echo "OK.\n";
}
$row=utf8_encode("test_readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n");
echo "$row";
socket_write($socket, $row);
echo "\n";
socket_close($socket);
?>
e

Erfan Hemmati

07/19/2022, 10:09 AM
javier i am not closing/destroy tcp socket connection via nodejs is that ok?
10:10 AM
const net               = require("net");
const config            = require('../config/app');

let ilpInstance;


class QuestDbService {
    constructor() {
        this.ilp_host    = config.questdb.ilp.host;
        this.ilp_port    = parseInt(config.questdb.ilp.port);
    }

    async initIlp() {
        console.log('ILP line connection!');
        const client       = new net.Socket();

        this.ilpConnection = await client.connect(9009, '167.235.72.172');

        return this;
    }

    async insertToTradesTableWithIlp(attributes) {
        if (! this.ilpConnection) {
            await this.init();
        }

        await this.ilpConnection.write(`trades,market=${attributes.market},market_id=${attributes.marketId} price=${attributes.price},amount=${attributes.amount},buy_fake=${attributes.buy_fake},sell_fake=${attributes.sell_fake} ${attributes.timestamp}` + "\n");
    }
    
}

QuestDbService.getIlpInstance     = async () => {
    if (! ilpInstance) {
        let classInstance         = new QuestDbService()
        ilpInstance               = classInstance.initIlp();
    }

    return ilpInstance;
};


module.exports = QuestDbService;
10:11 AM
usage
let [ilpInstance,] = await Promise.all([
                QuestDbService.getIlpInstance(),
            ]);

            await ilpInstance.insertToTradesTableWithIlp({
                market              : data.baseCoinSymbol + data.quoteCoinSymbol,
                marketId            : data.marketId,
                price               : data.price,
                amount              : data.amount,
                buy_fake            : data.buy_fake,
                sell_fake           : data.sell_fake,
                timestamp           : data.timestamp
            });
j

javier ramirez

07/19/2022, 10:11 AM
I am not great at node, but in general you always want to close your sockets, as seen at https://questdb.io/docs/develop/insert-data#examples
10:12 AM
We will be releasing an official nodejs client soon, by the way
e

Erfan Hemmati

07/19/2022, 10:12 AM
i your example first you open connection then write data after that closing the connection.
10:13 AM
in this case every insert must be open a new connection
j

javier ramirez

07/19/2022, 10:13 AM
You can close at the end of the stream before exiting the script
10:13 AM
in my case I am writing only a line 🙂
10:13 AM
But you can stream as much as you want before closing
e

Erfan Hemmati

07/19/2022, 10:14 AM
means in the example 😃)
10:16 AM
i use rabbitmq then i can dispatch data from php to same queue. in other hand nodejs will consume my message in insert via line protocol
10:19 AM
thank you about your php example
10:43 AM
javier i have another question about timestamp
10:44 AM
does quest db automatic convert my nanoseconds timestamp to UTC timezone?
10:44 AM
@javier ramirez
10:45 AM
i inserted data with my timezone and in questdb console shown exactly as UTC
10:47 AM
date time string in my timezone: 2022-07-19 15:04:05 nanosecond timestamp in my timezone: 1658226845000000000
j

javier ramirez

07/19/2022, 10:51 AM
the timestamp you send using ILP is nanoseconds using unix epoch, which starts at 1/1/1970T00:00:00Z
10:52 AM
to represent in whichever TZ makes sense for your business
e

Erfan Hemmati

07/19/2022, 11:58 AM
yes i gotted. in our business trades data should be stored as utc timezone in questdb.
11:58 AM
but date time in my mysql trades table base on my timzone.
12:00 PM
i converted my date time string to nano seconds (whout changing timezone) and sent into questsb via line protocol. i stored is utc format. question is does questdb store timestamps base on utc timezone?
12:56 PM
؟
j

javier ramirez

07/19/2022, 1:45 PM
if you have your data on mysql, and if you don’t need resolution smaller than one second, the easiest way is to just apply the UNIX_TIMESTAMP function directly on your mysql query https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_unix-timestamp
1:46 PM
that will give you the right epoch in seconds converting from the timezone on your data
1:46 PM
then you just multiply that by 1000000000 to get nanoseconds
1:46 PM
and that’s the timestamp you need to pass via ILP
e

Erfan Hemmati

07/19/2022, 2:03 PM
@javier ramirez thank you i got all of things that i want.
j

javier ramirez

07/19/2022, 2:03 PM
glad to hear 🙂
e

Erfan Hemmati

07/19/2022, 2:11 PM
@javier ramirez questdb does not have sample by week?
2:14 PM
or can we use SAMPLE BY 7d ALIGN TO CALENDAR; ?
j

javier ramirez

07/19/2022, 2:44 PM
No week sampling right now. I believe when you sample by 7d align to calendar you will get aggregated data from 7 days on each row, but the day will start actually from the 1st day in your query. So if your dataset starts on a wednesday, you would get samples of 7 days from wednesday to tuesday, which is not what you want (I think). But I am not 100% sure. Maybe give it a try and see what you get 🙂
2:46 PM
This was requested recently and there is an issue to track interest https://stackoverflow.com/questions/72874934/questdb-partition-by-week
e

Erfan Hemmati

07/19/2022, 3:31 PM
thank you very much
3:31 PM
with volume_real as (select sum(amount * price) as volume from 'trades'
  where market_id = 73 and (buy_fake = false or sell_fake = false) and timestamp between '2022-07-18T14:55:12.135Z' and '2022-07-19T14:55:12.135Z')


  select
    min(price) as min_price,
    max(price) as max_price,
    first(price) as first_price,
    last(price) as last_price
    from 'trades'
  where market_id = 73 and timestamp between '2022-07-18T14:55:12.135Z' and '2022-07-19T14:55:12.135Z';
3:32 PM
how can i excecute sub query and get my real_volume is a result column in the main result? @javier ramirez thankful for your responding
3:34 PM
im mysql or others it is very easy 🙂
3:37 PM
i want get data within one query
j

javier ramirez

07/19/2022, 6:03 PM
You can just do a join between those two queries
6:04 PM
since you have only one record on each you can do a cross join limit 1
6:04 PM
or else you could add market_id as a select column on both queries and do a join on market_id
e

Erfan Hemmati

07/20/2022, 6:20 AM
great
6:20 AM
i solved this problem by cross join
6:20 AM
i have another problem @javier ramirez
6:22 AM
i have multiple select via PG after my data insertion to the same table (trades).
6:22 AM
i use node js pg library and create a pool connection with limit 5
6:23 AM
but sometimes i see that server closed my connection
6:23 AM
such as
6:24 AM
6:25 AM
is there any way to resolve this? for example increasing pool connection limit or server configuration or pg wire configuration on the main server?
7:39 AM
using rest api can improve my performace? @javier ramirez
j

javier ramirez

07/20/2022, 8:14 AM
I am not an expert on low level connections. I know there are a bunch of parameters at https://questdb.io/docs/reference/configuration#postgres-wire-protocol, but I haven’t played with those much
e

Erfan Hemmati

07/20/2022, 3:54 PM
great!
3:54 PM
@javier ramirez ILP keep data in memory then insert into table?
j

javier ramirez

07/20/2022, 4:30 PM
It keeps data in memory for as long as the commitLag parameter dictates
4:30 PM
then writes to disk
4:31 PM
This will change in a few months, when we release write ahead log functionality as a first step to implement replication
e

Erfan Hemmati

07/20/2022, 4:45 PM
Imagine my commit lag is 5 second in this period I will write data with ILP then suddenly my container is restarted immediately before reach to commit lag. So I will lose my data? @javier ramirez
j

javier ramirez

07/20/2022, 4:50 PM
Yes. A workaround I can think of is simulating a snapshot
4:51 PM
When you issue a SNAPSHOT PREPARE statement, the db will flush data. You can then stop sending data, and issue a SNAPSHOT COMPLETE immediately afterwards
4:52 PM
not sure if this is workable for you
4:52 PM
And I am not even sure if that would work at all
4:52 PM
In any case, with the WAL functionality I mentioned, this won’t be an issue anymore
e

Erfan Hemmati

07/20/2022, 4:57 PM
🤔🤔
4:59 PM
@javier ramirez I think questdb lock table in the select. I have a lot of inserts to trades table. then system does not respond for my selects in parallel mode
4:59 PM
parallel insert and selects in same table with different data
5:00 PM
Does selects in questdb consume memory resources?
5:00 PM
Thank you for your help
j

javier ramirez

07/20/2022, 5:02 PM
Select shouldn’t lock a table. Also, unless you are using a very small machine, writes and reads should be independently handled by different CPU cores. For a large enough instance, you should have a dedicated writer and the reads would come from the shared pool
5:03 PM
If you don’t have enough cores to have a dedicated writer, then writes and reads will compete
e

Erfan Hemmati

07/20/2022, 5:06 PM
I have 32 core from hetzner colud
5:07 PM
you said by default reads/selects use shared pool. I am correct?
j

javier ramirez

07/20/2022, 5:07 PM
yes, reads always use the shared pool. It is the writes you can configure
e

Erfan Hemmati

07/20/2022, 5:08 PM
with io_shared_worker?
e

Erfan Hemmati

07/20/2022, 5:09 PM
Unlike MySQL and etc questdb consume memory for selects?
j

javier ramirez

07/20/2022, 5:10 PM
Maybe ask about your performance issue on the main channel (out of this thread) so someone else can help you. I am good with high level details, but not great with low level implementation 🙂
5:10 PM
sorry!
e

Erfan Hemmati

07/21/2022, 5:30 AM
@javier ramirez i gotted.
5:33 AM
@javier ramirez if i open a single connection from my application to the server questdb use shared pool. it means i have been not open 1 single connection im i correct?
j

javier ramirez

07/21/2022, 6:43 AM
shared workers makes reference to how many worker threads will be shared for multiple functions (including running SQL queries). It is not a connection pool, but a pool to decide which thread is doing what