WCKD10/01/2022, 8:07 PM
? - similar to how you can do with
in a postgresql server-side cursor? Is there a config setting in the
cursor.itersize = value
file? For example, if I run a big query returning 300M Rows, when looping through it with a
loop, I would like to be able to set mini batches of 100M rows for example, so that I can loop through the first 100M in RAM, then through the next batch of 100M and so on until the entire query is done. At the moment it seems like all 300M Rows are trying to be stored into the RAM, which my machine can't handle, so I was wondering if there's a way to limit the maximum amount of rows per batch just like you can by setting the
for row in cursor
value in a postgresql server-side cursor.
Jaromir Hamala10/01/2022, 10:24 PM
which does exactly what you want: It splits the overall result into mini-batches and fetches these mini-batches one at the time. In fact, this is a part of the Postgres wire protocol. What client/driver do you use?
WCKD10/02/2022, 7:30 AM
methods, but I just want to loop through to cursor without using any of those. By default: Turned out that psycopg2 fetches the entire result set to the client by default (when using
— unless you use a named cursor (server-side cursor). When you use a named cursor, the result set is maintained on the server-side allowing you to fetch rows as necessary. You can also control the no. of records fetched over the network in each go by setting the
property to a reasonable value (default 2000). However, QuestDB SQL does not support named cursors as I have already tried them, that's why I asked if there is a posibility of setting this itersize-like value somewhere for the client-side cursor without using an unsuported server-side cursor. Or, perhaps, you could add named-cursors as a feature in questdb?
Jaromir Hamala10/03/2022, 7:20 AM
WCKD10/03/2022, 7:28 AM
Jaromir Hamala10/03/2022, 7:32 AM
a good enoung workaround before implicit fetch limit / server-side cursors are implemented?
WCKD10/03/2022, 7:38 AM
no matter what, the entire query gets into RAM, as least that's the case with psycopg driver. It doesn't even matter if you fetch the query with
one row by one row, it would still fill up the RAM, as the query execution itself sends the result into RAM. I have compared three scenarios,
which fits in the RAM perfectly, and just looping through the cursor with for loop and in every scenario, the RAM would get filled up.
def query_database(query, fetchmany, fetchone): with pg.connect(CONNECTION) as conn: with conn.cursor(row_factory=pg.rows.dict_row) as cursor: cursor.execute(query) nr = 0 if(fetchone): while(cursor.fetchone() != None): nr +=1 elif(fetchmany): while True: result = (cursor.fetchmany(30_000_000)) if not result: break for row in result: nr +=1 else: for row in cursor: nr +=1 print(nr)
javier ramirez10/03/2022, 8:24 AM
WCKD10/03/2022, 8:43 AM
javier ramirez10/03/2022, 8:47 AM
WCKD10/03/2022, 10:58 AM
settings I could improve from the default ones so that the json's might be calculated and retrieved even faster? Thank you again!