https://questdb.io logo
Title
h

Hristo Katsarski

03/30/2023, 12:37 PM
Hello, I have a question related to this comment: https://questdb.io/docs/reference/clients/java_ilp/ QuestDB works best when rows are ingested in chronological order. This means rows with older timestamps are ingested before rows with newer timestamps. Receiving out-of-order data can have a performance impact:For QuestDB 6.6 and later versions, the out-of-order data ingestion has been optimized and automated. I'm planning to import data from multiple tables into one QuestDB table using CSV via COPY SQL method. The source tables will be exported to CSV files, then those will be merged into one file, so basically the timestamps will be repeated, i.e. timestamps from each source table will be mostly the same into the target QuestDB table. There's a distinct column of type SYMBOL that will be used when querying the data. Would it affect the DB performance if imported data via COPY SQL is with timestamps not in chronological order? Or it is best to order the CSV file first by SYMBOL+timestamp, then import.
j

Jaromir Hamala

03/30/2023, 1:45 PM
Hi Hristo, It’s better to pre-sort your CSVs if you can. Sorted CSV will be imported faster. On the other hand: SQL COPY deals with unsorted data better than other ingestion interfaces. It has the advantage of seeing all data at once so it can totally order them before starting the physical import into DB tables. Also: It affects only the ingestion performance - basically how long does it take to import the CSV file. Once the CSV is imported then query performance will be the same.
h

Hristo Katsarski

03/30/2023, 2:02 PM
Hi Jaromir, Thank you clarifying. Will follow your advice. Does the size of the input CSV file matter to the import process? I.e. one large file vs. multiple files split by reasonable sizes, and then imported one by one? Thanks
j

Jaromir Hamala

03/30/2023, 2:08 PM
It’s usually better to have everything in a single file - it allows QuestDB to fully sort everything before importing. However there are practical limits - for example it’s not very practical to have a single CSV file with many TBs in size. I’d also recommend to create a smaller CSV with a subset of rows (you can use the
head
utility on MacOS/Linux) and test the import procedure with that first. When your are happy with inferred types, column names, etc then you drop the test table and do a full import. and last but not least - see this guide: https://questdb.io/docs/guides/importing-data/
h

Hristo Katsarski

03/30/2023, 2:12 PM
Noted, many thanks!
j

Jaromir Hamala

03/30/2023, 2:13 PM
may I ask you what are you using QuestDB for?
h

Hristo Katsarski

03/30/2023, 2:27 PM
I'm experimenting replacing an SQL database that stores historical market data. I'd like to export and merge all trading instruments' data into a single QuestDB table.
j

Jaromir Hamala

03/30/2023, 2:32 PM
That sounds very cool! May I ask you what is the total dataset size? It sounds like it could be quite a few rows 🙂 I don’t know how familiar with QuestDB you are so please forgive me if you already know these things: Make sure you familiar yourself with the concepts of designated timestamps and the symbol types: https://questdb.io/docs/concept/designated-timestamp/ https://questdb.io/docs/concept/symbol/ Well chosen designated timestamps are important for good performance and the symbol type is related to both performance and storage efficiency.
n

Nicolas Hourcard

03/30/2023, 2:37 PM
welcome to our community @Hristo Katsarski
h

Hristo Katsarski

03/30/2023, 2:42 PM
Total DB size is ~100GB. I'm learning Quest DB now, my initial table design is like, e.g.:
CREATE TABLE minute_1 (code symbol CAPACITY 9182, ddate timestamp, ...other fields...), INDEX (code) timestamp(ddate) PARTITION BY MONTH;
Data will be queried by "code" and "ddate" >= certain time or "ddate" from/to range.
j

Jaromir Hamala

03/30/2023, 3:54 PM
That looks like a sensible schema! I would recommend to test query performance with and without indexes. indexes are not always faster, especially when you restrict the time interval. Also if you use M1 MacBook for development then please bear in mind than certain queries might be much faster when running on Intel and AMD (the CPUs frequently used in a production) due to optimizations which are not implemented on ARM. One more question if you don’t mind: How did you learn about QuestDB?
h

Hristo Katsarski

03/30/2023, 4:12 PM
I've been interested in Java timeseries database for quite some time, and don't really recall the first time I came across information or article about QuestDB. Nope, not using MacBook for development/testing, CPU should not be an issue, but good to know. Also, thank you for the indexes on/off advice, will try it out.