https://questdb.io logo
Title
s

Steven

02/02/2023, 9:55 AM
Hi! can i remove rows with duplicate timestamps when fetching data in SQL?
b

Bolek Ziobrowski

02/02/2023, 10:15 AM
Hi Steven. It is possible with group by or analytic functions but depends on how you want to choose which rows to remove and which to keep .
s

Steven

02/02/2023, 10:16 AM
Thank you for the prompt reply. I want to retain the last one! Could you let me know the way?
b

Bolek Ziobrowski

02/02/2023, 10:20 AM
You could do something like :
select ts, last(field1), last(field), ...
from
(
   your_query
) group by ts
but it might be more performant to alter your query not to produce duplicates .
s

Steven

02/02/2023, 10:22 AM
Thank you. how can i prevent duplicate in the first place when i have already duplicate values in the table? Could you shed some light on this?
p

Productboard

02/02/2023, 10:44 AM
🎉 New note created.
b

Bolek Ziobrowski

02/02/2023, 10:46 AM
It depends on the query, e.g. latest on shouldn't produce duplicates if the have the same values for ts and partition by columns .
m

Matt

02/02/2023, 6:19 PM
@Steven, your answer of "I want to retain the last one!" makes me think you might be looking for the functionality of analytic functions as Bolek mentioned.
Here's an example using the analytic function
row_number()
.
@Steven, by the way, I just came across the SQL extension LATEST ON. I think it might be exactly what you're looking for. It's very nice. • It's simpler syntax than my
row_number()
example. • The results are identical. • The performance is identical for this very small query (≈42ms). • But the explain plan is distinctly different, and I suspect it will be generally better.
select
  cab_type, vendor_id, pickup_datetime, trip_distance
from trips
where pickup_datetime between '2009-01-01T00:02:56.000000Z' and '2009-01-01T00:03:00.000000Z'
latest on pickup_datetime PARTITION BY cab_type, vendor_id;