https://questdb.io logo
Title
a

Akshay Hartalkar

02/21/2023, 12:09 PM
Hi everyone, i have around 146 Million rows in a SQL server, i want to migrate it to questDB instance, i was thinking writing a script to migrate data, is there a direct way to do this ?
j

Jaromir Hamala

02/21/2023, 12:18 PM
hi Akshay, can you export data from the SQL server as CSV files? 146M of rows is not a small table, but it’s not terribly large either. Let’s assume each row takes 1KB when exported as CSV. You would end-up with 146GB of data, that’s not too bad and QuestDB imports it via SQL COPY: https://questdb.io/docs/guides/importing-data/ Some tips: • Make sure your CSV files are ordered by time • Export a small subset of your table first, practice the import procedure and only then do it with full data set • Use fast disks if you can - this make a big difference in CSV import performance
n

Nicolas Hourcard

02/21/2023, 1:00 PM
hey @Akshay Hartalkar, what hardware will you use for the import?
a

Akshay Hartalkar

02/21/2023, 1:05 PM
the questdb is installed on a 2vCPU and 4GB ram aws instance
j

Jaromir Hamala

02/21/2023, 1:27 PM
oh, that’s a rather small box. is the disk instance-local or remote/EBS?
i

Imre

02/21/2023, 1:40 PM
another option to migrate the data with a bit of code. in Java it would look something like this:
import io.questdb.client.Sender;
import java.sql.*;

public class DataMigration {
    public static void main(String[] args) {
        Connection conn = null;
        Statement stmt = null;
        ResultSet rs = null;
        try (Sender sender = Sender.builder().address("questdbhost:9009").build()) {
            String dbURL = "jdbc:sqlserver://...";
            String user = "sa";
            String pass = "pwd";
            conn = DriverManager.getConnection(dbURL, user, pass);

            String sql = "SELECT * FROM table_to_migrate";
            stmt = conn.createStatement();
            rs = stmt.executeQuery(sql);

            while (rs.next()) {
                // read record from SQL Server
                String strField = rs.getString("NAME");
                String symbolField = rs.getString("GENDER"); //return "M" or "F"
                int intField = rs.getInt("AGE");

                // write to QuestDB
                sender.table("table_to_migrate")
                        .symbol("gender", symbolField)
                        .stringColumn("name", strField)
                        .longColumn("age", intField)
                        .atNow();
            }
        } catch (SQLException ex) {
            ex.printStackTrace();
        } finally {
            try {
                if (rs != null) {
                    rs.close();
                }
                if (stmt != null) {
                    stmt.close();
                }
                if (conn != null) {
                    conn.close();
                }
            } catch (SQLException ex) {
                ex.printStackTrace();
            }
        }
    }
}
using JDBC and QuestDB client. pls note, never run this code but i think it is a good start.
j

Jaromir Hamala

02/21/2023, 1:43 PM
that’s a good option too. assuming the target QuestDB table uses a designated timestamp then it would be good to use
ORDER BY
in the SQL query: To get results sorted by time. It will make the ingestion much faster.
SELECT * FROM table_to_migrate ORDER BY ts ASC;
a

Alex Pelagenko

02/21/2023, 1:55 PM
For MSSQL jdbc connection will be different + driver has to be added to the dependencies. What programming language are you comfortable with?
j

javier ramirez

02/21/2023, 2:41 PM
Also note the code above uses .
atNow(),
which would assign as designated timestamp the ingestion time. If that’s not what you want, then you can use
.at(long)
to pass the nanoseconds value representing your timestamp
a

Akshay Hartalkar

02/22/2023, 5:53 AM
Thanks for all the replies, @Jaromir Hamala the instance might be small but doable for the use case, as for the storage its a EBS volume(gp2 300 IOPS) attached to the instance. i assume import CSV might put a huge load on server
j

javier ramirez

02/22/2023, 9:18 AM
you should consider moving to gp3. It is always cheaper than gp2 and more performant by default. https://aws.amazon.com/es/blogs/storage/migrate-your-amazon-ebs-volumes-from-gp2-to-gp3-and-save-up-to-20-on-costs/
You have here some independent analysis on performance/price with charts https://cloudwiry.com/ebs-gp3-vs-gp2-pricing-comparison/
For reference, you can check here the time it took us to ingest a heavily unsorted 76GB CSV in our benchmarks using gp2 https://questdb.io/blog/2022/09/12/importing-300k-rows-with-io-uring/
The file is doctored to be specially difficult for us, because we need to rewrite partitions frequently as data is out of order. If your data is in order, you should expect faster times
In our case, it took ~28 minutes to load. If this is for a one-off migration, it is probably safe to assume you can afford that load time. If you need a faster switch over, you could always import first all the partitions until yesterday in one go. Then import only the fresh data on top, which sould be very fast
As you can see in the blogpost, using a temporary local volume for the import sped up the process quite a bit, so you could always create a beefy instance with the main volume using gp3 maxed out, then the CSV on a local NVMe drive only for the import. Once the import is done, you can just downsize both the instance and gp3 specs, and delete the NVMe altogether. If your original exported CSV is in order, not sure engineering the migration this way will be worth your while