Title
#users-public
w

Weibo Lei

11/09/2022, 2:43 AM
Hi QuestDB Team, There are 37317008 rows in my table. I do the query from the table, looks like the query speed is a bit slow.
SELECT * FROM kline_item WHERE stock_id = '603501' and market_type = '1' and k_type = '6' and candle_mode = '2' order by market_date desc limit 500;
it may spend more than 1 second to the first time query.
500 rows in 4.35s
Execute: 4.28sNetwork: 68.74msTotal: 4.35s
and It normally spends 600ms in the next couple of times with the same SQL. the schema of the table is:
CREATE TABLE kline_item(
    market_date TIMESTAMP, 
    update_date TIMESTAMP,
    market_date_int int,
    update_date_int int,
    market_type SYMBOL CAPACITY 4 NOCACHE INDEX,
    stock_id SYMBOL CAPACITY 28000 NOCACHE INDEX,
    k_type SYMBOL CAPACITY 5 NOCACHE INDEX,
    highest_price double,
    lowest_price double,
    open_price double,
    close_price double,
    trade_val double,
    trade_amount long,
    change_amount double,
    change_pct double,
    exchange_ratio double,
    candle_mode SYMBOL CAPACITY 2 NOCACHE INDEX,
    up_or_down_limt byte,
    ma60 double,
    ma120 double,
    ma250 double,
    state byte,
    create_time TIMESTAMP,
    update_time TIMESTAMP
), INDEX(stock_id) TIMESTAMP(market_date) 
PARTITION BY DAY;
ps: I index the fields like 'market_type','stock_id','k_type' and 'candle_mode' because I want to make the query faster during the query. How can I speed up my query speed?
Jaromir Hamala

Jaromir Hamala

11/09/2022, 8:31 AM
hello, what version of questdb you are on? are you on Intel or ARM CPU? did you try a schema without indexes? it can often times be faster.
8:33 AM
judging from your symbols capacity I assume your symbols have just a few unique values. in this case indexes won’t help you - as they match a good chunk of the overall dataset.
Bolek Ziobrowski

Bolek Ziobrowski

11/09/2022, 8:35 AM
Hello @Weibo Lei . Indexing low cardinality fields doesn't make sense in general so I'd try two things :1. change most symbol columns to byte and leave stock_id as symbol
CREATE TABLE kline_item( 
                    market_date TIMESTAMP,  
                    update_date TIMESTAMP, 
                    market_date_int int, 
                    update_date_int int, 
                    market_type byte, 
                    stock_id SYMBOL CAPACITY 28000 INDEX, 
                    k_type byte, 
                    highest_price double, 
                    lowest_price double, 
                    open_price double, 
                    close_price double, 
                    trade_val double, 
                    trade_amount long, 
                    change_amount double, 
                    change_pct double, 
                    exchange_ratio double, 
                    candle_mode byte, 
                    up_or_down_limt byte, 
                    ma60 double, 
                    ma120 double, 
                    ma250 double, 
                    state byte, 
                    create_time TIMESTAMP, 
                    update_time TIMESTAMP 
                ), INDEX(stock_id) TIMESTAMP(market_date)  
                 PARTITION BY DAY;
This would make qdb use index on stock_id and not candle_mode (which it's using for your query). 2. change all symbol columns to numeric types and remove indexes :
CREATE TABLE kline_item( 
                    market_date TIMESTAMP,  
                    update_date TIMESTAMP, 
                    market_date_int int, 
                    update_date_int int, 
                    market_type byte, 
                    stock_id int, 
                    k_type byte, 
                    highest_price double, 
                    lowest_price double, 
                    open_price double, 
                    close_price double, 
                    trade_val double, 
                    trade_amount long, 
                    change_amount double, 
                    change_pct double, 
                    exchange_ratio double, 
                    candle_mode byte, 
                    up_or_down_limt byte, 
                    ma60 double, 
                    ma120 double, 
                    ma250 double, 
                    state byte, 
                    create_time TIMESTAMP, 
                    update_time TIMESTAMP 
                )  TIMESTAMP(market_date)  
                 PARTITION BY DAY;

query (watch out not to use string literals because type mismatch would disable JIT ) : 
SELECT * FROM kline_item 
WHERE stock_id = 603501 
and market_type = 1 
and k_type = 6 
and candle_mode = 2 
order by market_date desc 
limit 500;
This should get rid of sorting and traverse table backward with jitted-filters . The speedup you're observing is most likely due to initially cold data entering the OS cache .
Jaromir Hamala

Jaromir Hamala

11/09/2022, 8:35 AM
another trick to speed-up queries could be to use the Postgres protocol and prepared statement. So if you run the same kind of query again, just with different parameters, then QuestDB does not have to compile it again. and now I can see Bolek is replying too. He is certainly a better expert on SQL so I’ll wait for his ideas 🙂
Bolek Ziobrowski

Bolek Ziobrowski

11/09/2022, 8:47 AM
Sorry for the delay Jaromir, I was checking how things run under the hood for this query . Additional learnings : • So far I haven't seen plans that'd combine multiple indexes on a table so having more than one index on the table won't speed things up maybe expect when table is used in subqueries or joined to again • There's no good general criteria for choosing main index (cardinality stats!) to use for query so having more than one index may result in qdb choosing wrong one and slowing down the query • even though qdb uses index scan for 1. it unnecessarily does a forward scan + sort instead of backward scan without sorting . This should be improved .
w

Weibo Lei

11/09/2022, 3:10 PM
@Jaromir Hamala Jaromir, Thanks for the replay. I am using quest 6.5.4 on Linux Intel CPU.
3:13 PM
@Bolek Ziobrowski Thanks for your suggestion and knowledge sharing about the QuestDB performance query speed-up. Let me do some changes to my table, and I will back to you soon if it makes my query faster.
5:51 PM
Hello Bolek, I changed all the symbol columns to int, but the stock_id colum (as the stock_id I have to keep it as string type). the sql I run on the web console:
select * from 'kline_item' where stock_id = '300519' and k_type = 6 and candle_mode = 1 limit 500;
It shows executes about 10-20 ms, that's cool. But at the first time (with a different query parameter), it may takes 2-5 seconds sometimes. Can I speed up my first-time query?
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 8:55 AM
Hello Weibo Lei . Can you share current table DDL ?
w

Weibo Lei

11/10/2022, 8:57 AM
Bolek, please check out the table DDL:
CREATE TABLE kline_item1(
    market_date TIMESTAMP, 
    update_date TIMESTAMP,
    market_date_int int,
    update_date_int int,
    market_type int,
    stock_id SYMBOL CAPACITY 28000 NOCACHE INDEX,
    k_type int,
    highest_price double,
    lowest_price double,
    open_price double,
    close_price double,
    trade_val double,
    trade_amount long,
    change_amount double,
    change_pct double,
    exchange_ratio double,
    candle_mode int,
    up_or_down_limt byte,
    ma60 double,
    ma120 double,
    ma250 double,
    state byte,
    create_time TIMESTAMP,
    update_time TIMESTAMP
), INDEX(stock_id) TIMESTAMP(market_date) 
PARTITION BY DAY;
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 8:57 AM
Are these timings for different stock_id values ?
w

Weibo Lei

11/10/2022, 8:58 AM
#1 and #2 have same stock_id a. #3 and #4 have same stock_id b. a is different from b.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 8:59 AM
Isn't stock_id just a number ?
8:59 AM
Just wanted to make sure .
w

Weibo Lei

11/10/2022, 9:00 AM
Nop, the stock_id must be string type in my case.
9:02 AM
Please checkout my golang code(most refer to the doc actually.):
func OpenPostgreSQL() *sql.DB {
	connStr := fmt.Sprintf(
		"host=%s port=%d user=%s password=%s dbname=%s sslmode=disable",
		host, port, user, password, dbname)
	db, err := sql.Open("postgres", connStr)
	if err != nil {
		panic(err)
	}
	return db
}
func GetKlineFromQuestDB(stockId string, marketType uint32, klineType, candleMode int) {
	qdb := datasource.OpenPostgreSQL()
	defer qdb.Close()

	sql := fmt.Sprintf("SELECT stock_id, market_type, open_price, close_price, lowest_price, highest_price FROM kline_item1 WHERE market_date in '2022' and stock_id = '%v' and market_type = %v and k_type = %v and candle_mode = %v order by market_date desc limit 500", stockId, marketType, klineType, candleMode)
	fmt.Println(sql)
	startTime := time.Now()
	rows, err := qdb.Query(sql)
	checkErr(err)
	defer rows.Close()
	logger.GetDefaultLogger().Infof("[QuestDB] Query kline total Spend[before scan]: %v", time.Since(startTime))
	for rows.Next() {
		var kline = db.KLineItem{}
		err = rows.Scan(&kline.StockId, &kline.MarketType, &kline.OpenPrice, &kline.ClosePrice, &kline.LowestPrice, &kline.HighestPrice)
		checkErr(err)
		// logger.GetDefaultLogger().Infof("[QuestDB] The kline value is:%v", kline)
	}

	err = rows.Err()
	checkErr(err)
	logger.GetDefaultLogger().Infof("[QuestDB] Query kline total Spend: %v", time.Since(startTime))
}
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:02 AM
Can you run the following ? select count() from kline_item1; select stock_id, count() from kline_item1 group by stock_id order by 2 desc limit 100;
w

Weibo Lei

11/10/2022, 9:04 AM
select stock_id, count(*) from kline_item1 group by stock_id order by 2 desc limit 100; count: 37321999 Execute: 781.69μsNetwork: 259.22msTotal: 260ms select stock_id, count(*) from kline_item1 group by stock_id order by 2 desc limit 100; 100 rows in 184ms Execute: 81.21msNetwork: 102.79msTotal: 184ms
9:04 AM
I just run it in my web console.
9:05 AM
Do I need to run it in my golang code?
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:07 AM
Can you share output of second query ?
9:09 AM
And could you also check size of table on disk - it's db/table_name directory .
w

Weibo Lei

11/10/2022, 9:09 AM
please check the out put of the second query.
9:18 AM
size of the table:
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:23 AM
Hmm, does 2022-01-01 partition hold those 37 mil records ?
9:23 AM
Can you also share disk type/specs ?
w

Weibo Lei

11/10/2022, 9:24 AM
Hmm, does 2022-01-01 partition hold those 37 mil records ?
yes, as the kline real date time range is almost from 1990 to 2022, it's very long. I think the time range is too long which may impact the query speed. So I do some data convert that make the date in the same day.
9:28 AM
disk type/specs: 100GiB (2600 IOPS)
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:29 AM
What about disk read throughput ?
9:31 AM
Is this data set static or will it grow significantly ?
w

Weibo Lei

11/10/2022, 9:32 AM
the data will grow every day.
9:33 AM
What about disk read throughput ?
max: 140 mb/s
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:34 AM
Will the number of records per stock_id used by the query also grow ?
9:36 AM
At the moment there seems to be around 2k stock ids, with ~17k records per stock_id .
w

Weibo Lei

11/10/2022, 9:38 AM
Normally the stock id will not grow, but sometimes, we need to add some new stock id. Currently, there are about 4k stock_id actually.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:40 AM
If there's 4k stock ids means that for some stock_ids there will be way less records than 17k .
w

Weibo Lei

11/10/2022, 9:40 AM
yes
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:41 AM
Normally the stock id will not grow, but sometimes, we need to add some new stock id. So adding new stock ids is rare but adding records for existing stock_ids is not , right ?
w

Weibo Lei

11/10/2022, 9:42 AM
yes
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 9:42 AM
Can you predict how much the dataset could grow in a few months ?
w

Weibo Lei

11/10/2022, 9:47 AM
About 400,000 records in one month.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 10:05 AM
At the moment data seems to fit in ram so you could : • if stock_id only contains digits you could try to change stock_id column to int, remove index and transform stock_id to number when querying or • try to pre-warm the data with : select touch(select * from kline_item1) By the way - the query you shared first didn't have order by clause while golang code has order by market_date desc (with limit) . Is that order by and limit 'static' or are there variations of the query that won't have them ?
10:08 AM
If there's order by market_date desc with limit 500 then the ideal plan for worst case perfromance would indeed be backward index scan on stock_id (no sorting necessary). For some reason optimizer doesn't do it though so we should improve it .
w

Weibo Lei

11/10/2022, 11:24 AM
Thanks for replying. in my case stock_id not digital only, some of the cases have prefixes C, A, etc(based on our business). order by market_date desc with limit 500 is required in our business. some times, we need to ingest the latest 500 records at one time.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 11:42 AM
Does that order by with limit always apply ? I'm asking because if there's e.g. 16k records to read spread randomly across larger data set then worst case query time will be on the order of 16k IOPS/2.6k ~ 6s (assuming disk read latency is good enough to allow 2.6k reads from a single thread; index acess is not multithreaded). If you only need 500 records and order by timestamp asc or desc then at most we need 500 random reads, so worst case should be below 1 second .
w

Weibo Lei

11/10/2022, 11:44 AM
Does that order by with limit always apply ?
yes.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 11:48 AM
All right . Have you tried pre-warming data with touch() function and recording execution times ?
w

Weibo Lei

11/10/2022, 11:51 AM
Just tried on web console, looks fatser after pre-warm. but I found that, if there are different query, the pre-warm is not work, right?
11:52 AM
I am trying to change the higher IO throughput disk to see if query faster.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 11:52 AM
pre-warm touches data used by a query so should pull data in os cache (as much as fits in) but nothing guarantees that data stays in it because it's up to os .
w

Weibo Lei

11/10/2022, 11:52 AM
ok
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 11:53 AM
What do you mean by different query ? completely different query against this table, different table or the same query but with different stock_id paramter value ?
w

Weibo Lei

11/10/2022, 11:53 AM
different query against this table.
Bolek Ziobrowski

Bolek Ziobrowski

11/10/2022, 11:54 AM
Different query means different access pattern, e.g. if you access a lot of data and then sort by arbitrary field then it's going to take time .
12:08 PM
If you really require 100ms max query times for potentially arbitrary queries (and growing data set) then I'd suggest either getting a good nvme disk and/or enough RAM to hold data in it . I think cheap EBS gives ~ 1000 random IOPS or 120MB/s sequential read throughput per thread and that's not a big IO budget with 100ms response time constraint .
w

Weibo Lei

11/10/2022, 12:57 PM
Ok, Bolk. Thanks for your help and suggestion. I will talk to my team all the things that I learned from you. Once again, thank you! 🙏