https://questdb.io logo
Title
u

李华

12/13/2022, 1:46 AM
hi everybody, Can QuestDB use ShardingSphere for sharding table?
p

Pei

12/13/2022, 8:10 AM
Hi @李华 afaik no one tested it before, perhaps the best way is to give it a try and see what works and what not. Could you share more about the scenarios when you’d need sharding?
u

李华

12/13/2022, 8:51 AM
I have a table, and the data volume is about 30 million per day. If the table query is conducted by week, the expected effect will not be reached, so I want to table by day.
p

Pei

12/13/2022, 8:58 AM
Thank you! Do you perhaps have a query example?
u

李华

12/13/2022, 9:53 AM
This is an SQL query for a week's data.There are currently 200 million pieces of data in this table, and the query time is as follows.
select ts, id, first(value) as value from 'historyData_AM' where ts BETWEEN '2021-11-01' AND '2021-11-14' AND id = 7003 SAMPLE BY 30m;
The table structure is as follows.
p

Pei

12/13/2022, 10:09 AM
Thank you so much! I’ll check with the engineering team on this see how we could help.
n

Nicolas Hourcard

12/13/2022, 10:48 AM
Hello, so essentially you have a week worth of data and you want to make 7 tables with daily data instead ?
p

Pei

12/13/2022, 12:22 PM
From the query it might be more about sharding the table based on the ID (but I could be wrong). @Bolek Ziobrowski I think we’ve seen similar situation before, do you remember whether using symbol would improve the query performance potentially?
b

Bolek Ziobrowski

12/13/2022, 12:34 PM
@李华 If id is selective you could improve response time by making id a symbol and adding an index.
By the way - is that response data for cold or hot data ?
u

李华

12/13/2022, 12:39 PM
Which data is cold data.
b

Bolek Ziobrowski

12/13/2022, 12:39 PM
Cold data is data on disk that can't be found in os cache .
If you run the query a few times you'll get different response times as data gets 'warmer' .
So first run is usually 'cold' while 3rd or 4th should give you fastest/hot response time .
u

李华

12/13/2022, 12:45 PM
Yes, the query result of cold data above, is there any way to shorten the query time of cold data
b

Bolek Ziobrowski

12/13/2022, 12:49 PM
Have you tried making id a symbol and adding index?
Also - what's the response time after you run the query a few times ?
u

李华

12/13/2022, 1:02 PM
The query efficiency is the same after the id index is added.
But the speed details got faster after several inquiries.
b

Bolek Ziobrowski

12/13/2022, 1:07 PM
Is it the same with and without index for cold and hot data ? Can you run the following ?
select count_distinct(id) from historyData_AM

select count(*) as value from 'historyData_AM' where ts BETWEEN '2021-11-01' AND '2021-11-14' AND id = 7003 SAMPLE BY 30m;