Title
#users-public
j

Jakub Kubík

08/23/2022, 12:04 PM
Hello guys! I am currently investigating a problem where a query executed using a
psycopg2
cursor
takes minutes to execute and eventually times out, whereas the same query executes in the matter of hundreds of milliseconds in the web GUI. Has anyone come across similar behaviour?
Alex Pelagenko

Alex Pelagenko

08/23/2022, 12:10 PM
When you execute in GUI it only fetches top X rows, so network transfer is much lower.
12:11 PM
When you do postgres driver select it will probably fetch the whole data set and if it’s big it can take much longer
j

Jakub Kubík

08/23/2022, 12:27 PM
I understand what you're saying, but I don't think it's the issue I am having. The code never reaches the line where I would fetch the results, it times out at
cursor.execute(query)
Bolek Ziobrowski

Bolek Ziobrowski

08/23/2022, 12:59 PM
"limit 1000" added to queries issued via web console can affect query execution time dramatically . Some execution plans (sort, duplicate removal, ...) must prepare whole result set before returning first result .
1:00 PM
Bind variables can also be a factor here .
j

Jakub Kubík

08/23/2022, 2:47 PM
The query in question is a simple SELECT with a WHERE statement with 3 conditions. Pardon me if I am wrong, but doesn't the number of fetched rows in the log mean that the query finished execution on all the rows, not just the first 1000?
3:19 PM
Interesting. if we execute the same query with the timestamp range a couple of months back, it seems to work normally.
Alex Pelagenko

Alex Pelagenko

08/23/2022, 3:42 PM
I guess it’s time to show us the query 😉
j

Jakub Kubík

08/23/2022, 3:56 PM
SELECT * FROM l2
WHERE ("platform" = 'epex')
AND ("timestamp" BETWEEN '2022-06-30 12:00:00' AND '2022-07-01 02:00:00')
AND ("delivery_start_utc_symbol" LIKE '2022-07-01T00%')
AND ("tso" = '10YDE-RWENET---I')
3:57 PM
I've also tried removing all conditions but the timestamp one and it did not help, at least not noticeably (still timeouted)
Alex Pelagenko

Alex Pelagenko

08/23/2022, 4:08 PM
nothing obvious, only have initial explanation. What’s row count on fast vs slow execution?
j

Jakub Kubík

08/23/2022, 4:12 PM
By slow you mean the psycopg2 connection?
4:13 PM
If so, I can't really tell you, because it times out before
cursor.execute()
can even finish.
Alex Pelagenko

Alex Pelagenko

08/23/2022, 4:13 PM
you can see total row count in the console
4:13 PM
at the status bar
j

Jakub Kubík

08/23/2022, 4:14 PM
Okay, so what do you mean by fast vs slow, then?
Alex Pelagenko

Alex Pelagenko

08/23/2022, 4:15 PM
you said different periods work differently
j

Jakub Kubík

08/23/2022, 4:16 PM
Interestingly so, in the console, the query that actually finishes in Python is slower.
Alex Pelagenko

Alex Pelagenko

08/23/2022, 4:18 PM
can you curl slow query to file without lmit?
j

Jakub Kubík

08/23/2022, 4:34 PM
Okay, another piece of the puzzle.
4:35 PM
jkubik@main-desktop /tmp 
➜ curl -G --data-urlencode \ 
    "query=SELECT * FROM l2 WHERE (platform = 'epex') AND (timestamp BETWEEN '2022-05-31 12:00:00' AND '2022-06-01 02:00:00') AND (delivery_start_utc_symbol LIKE '2022-06-01T00%') AND (tso = '10YDE-RWENET---I')" \
    http://:9000/exp > data.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.7M    0 13.7M    0     0   836k      0 --:--:--  0:00:16 --:--:-- 1091k
jkubik@main-desktop /tmp 
➜ curl -G --data-urlencode \
    "query=SELECT * FROM l2 WHERE (platform = 'epex') AND (timestamp BETWEEN '2022-06-30 12:00:00' AND '2022-07-01 02:00:00') AND (delivery_start_utc_symbol LIKE '2022-07-01T00%') AND (tso = '10YDE-RWENET---I')" \                                                       
    http://:9000/exp > data.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 35.9M    0 35.9M    0     0   118k      0 --:--:--  0:05:09 --:--:--     0
4:35 PM
The slow query (the 2nd one) does not even finish
4:35 PM
It just sits at 0 B/s until timeout.
Alex Pelagenko

Alex Pelagenko

08/23/2022, 4:37 PM
Maybe data is corrupt in one of the columns. Are there any errors in the logs?
j

Jakub Kubík

08/23/2022, 4:38 PM
I'd have to check that with our DevOps team, will post an update as soon as they respond.
4:38 PM
Thanks for the help so far!
Alex Pelagenko

Alex Pelagenko

08/23/2022, 4:38 PM
it will be one of the columns which are in select but not in where
j

Jakub Kubík

08/23/2022, 4:58 PM
Yes, we have found some rows with corrupted data in the time range specified by the slow query.
4:58 PM
Thanks a lot for the help @Alex Pelagenko!
9:19 AM
Okay, I've got another question regarding this issue. Is there a way to remove/update the corrupted rows so the query finishes? Other than the corrupted rows, the data in the table is rather useful for us.
Alex Pelagenko

Alex Pelagenko

08/24/2022, 9:20 AM
you should be able to update
j

Jakub Kubík

08/24/2022, 9:21 AM
Okay, but how do we determine which rows are corrupted?
Alex Pelagenko

Alex Pelagenko

08/24/2022, 9:21 AM
do you know what columns?
9:21 AM
what column types
9:22 AM
most basic way would be binary search
9:22 AM
but it’s likely to be end of partition
j

Jakub Kubík

08/24/2022, 9:22 AM
It's most likely multiple columns, the corruption was caused by service crashes.
Alex Pelagenko

Alex Pelagenko

08/24/2022, 9:23 AM
and string columns
9:24 AM
or symbol
9:24 AM
what’s the error in the logs when you try to query it?
j

Jakub Kubík

08/24/2022, 9:24 AM
Is there a way to find the corruption without needing the query to timeout, as that would be quite a lengthy process even using binary search
Bolek Ziobrowski

Bolek Ziobrowski

08/24/2022, 12:31 PM
If table is partitioned then it shouldn't take a long time to binary search with something like
select broken_column
from table 
where ts > X1 and ts < X2;
Even better - instead of relying only on response time you could simply check logs for exception(s) .