WCKD
10/01/2022, 8:07 PMcursor.execute(big_query)
? - similar to how you can do with cursor.itersize = value
in a postgresql server-side cursor? Is there a config setting in the server.conf
file?
For example, if I run a big query returning 300M Rows, when looping through it with a for row in cursor
loop, I would like to be able to set mini batches of 100M rows for example, so that I can loop through the first 100M in RAM, then through the next batch of 100M and so on until the entire query is done.
At the moment it seems like all 300M Rows are trying to be stored into the RAM, which my machine can't handle, so I was wondering if there's a way to limit the maximum amount of rows per batch just like you can by setting the itersize
value in a postgresql server-side cursor.Jaromir Hamala
10/01/2022, 10:24 PMfetchSize
which does exactly what you want: It splits the overall result into mini-batches and fetches these mini-batches one at the time. In fact, this is a part of the Postgres wire protocol.
What client/driver do you use?WCKD
10/02/2022, 7:30 AMfetchall()
, fetchmany(x)
or fetchnone()
methods, but I just want to loop through to cursor without using any of those.
By default: Turned out that psycopg2 fetches the entire result set to the client by default (when using fetchall(), fetchmany(x)
or fetchnone()
— unless you use a named cursor (server-side cursor). When you use a named cursor, the result set is maintained on the server-side allowing you to fetch rows as necessary. You can also control the no. of records fetched over the network in each go by setting the cursor.itersize
property to a reasonable value (default 2000).
However, QuestDB SQL does not support named cursors as I have already tried them, that's why I asked if there is a posibility of setting this itersize-like value somewhere for the client-side cursor without using an unsuported server-side cursor.
Or, perhaps, you could add named-cursors as a feature in questdb?Jaromir Hamala
10/03/2022, 7:20 AMWCKD
10/03/2022, 7:28 AMJaromir Hamala
10/03/2022, 7:32 AMfetchmany()
a good enoung workaround before implicit fetch limit / server-side cursors are implemented?WCKD
10/03/2022, 7:38 AMcursor.execute()
no matter what, the entire query gets into RAM, as least that's the case with psycopg driver. It doesn't even matter if you fetch the query with fetchone()
one row by one row, it would still fill up the RAM, as the query execution itself sends the result into RAM.
I have compared three scenarios, fetchone()
, fetchmany(30M)
which fits in the RAM perfectly, and just looping through the cursor with for loop and in every scenario, the RAM would get filled up.
def query_database(query, fetchmany, fetchone):
with pg.connect(CONNECTION) as conn:
with conn.cursor(row_factory=pg.rows.dict_row) as cursor:
cursor.execute(query)
nr = 0
if(fetchone):
while(cursor.fetchone() != None):
nr +=1
elif(fetchmany):
while True:
result = (cursor.fetchmany(30_000_000))
if not result:
break
for row in result:
nr +=1
else:
for row in cursor:
nr +=1
print(nr)
javier ramirez
10/03/2022, 8:24 AMWCKD
10/03/2022, 8:43 AMjavier ramirez
10/03/2022, 8:47 AMWCKD
10/03/2022, 10:58 AMserver.conf
settings I could improve from the default ones so that the json's might be calculated and retrieved even faster?
Thank you again!