Title
#users-public
o

Orrin

02/07/2022, 8:33 PM
I made a small test case for the EXCEPT keyword so that I can maintain unique property of a table by only inserting new rows. however, when I EXCEPT all fields & rows from two almost identical tables, I find that more rows are excepted than appears true or expected. in a 20k rows, i have a handful of unexpected and inexplicably excepted rows. I have narrowed it down to a single field in one such case as per the following query
SELECT Volume FROM smallTable WHERE TradeDate = '2022-02-04T00:00:00.000000Z' AND RIC = 'SPXWb042245500.U' EXCEPT SELECT Volume FROM HJGIHL WHERE TradeDate = '2022-02-04T00:00:00.000000Z' AND RIC = 'SPXWb042245500.U';
weirdly though, when I run each side of the EXCEPT independently, I see that the single value returns to be identical. is this a bug or am i missing some behavioral quirk?
9:25 PM
NB. for the purposes of emulating my workflow, I have the HJGIHL table created from the upload of a CSV. my suspicion is that this is float approximation issue
9:29 PM
after inserting the EXCEPT rows into the smallTable, when I run this,
select * from smallTable WHERE TradeDate = '2022-02-04T00:00:00.000000Z' AND RIC = 'SPXWb042245500.U';
i get a result that has apparently two identical rows.
Bolek Ziobrowski

Bolek Ziobrowski

02/08/2022, 7:49 AM
Hello @Orrin. Can you share table definitions ?
8:01 AM
I tried the following and got correct (empty) result :
create table t1 (
    TradeDate timestamp,
    Open double,
    High double,
    Low double,
    Last double,
    Volume double,
    Bid double,
    Ask double,
    RIC string,
    SettlementPrice double
);

insert into t1 values
(
    '2022-02-04T00:00:00.000000Z',11.6800,14.0000,0.0500,0.0500,56905.9968, null,0.0500,'SPXWb042245500.U', null
);

select * from t1;

create table t2 (
    TradeDate timestamp,
    Open double,
    High double,
    Low double,
    Last double,
    Volume double,
    Bid double,
    Ask double,
    RIC string,
    SettlementPrice double
);

insert into t2 values
(
    '2022-02-04T00:00:00.000000Z',11.6800,14.0000,0.0500,0.0500,56905.9968, null,0.0500,'SPXWb042245500.U', null
);

select * from t2;

select * from t1 except select * from t2 ;
Could you try subtracting these seemingly identical floating point values ? Something like :
select t1.Open - t2.Open, 
t1.High-t2.High, ... 
from t1 join t2 on 1=1 ;
o

Orrin

02/08/2022, 11:59 PM
so to be clear, the two tables I’m EXCEPTing are very similar. >99% of the expected outcome occurs--it’s only a handful of weird ones. can you explain the purpose of that test?
Bolek Ziobrowski

Bolek Ziobrowski

02/09/2022, 10:11 AM
Hi @Orrin It's to make sure it's not about floating point quirks as you suspected before .
o

Orrin

02/14/2022, 7:54 PM
so the except and the subtractions both returned as expected.
7:56 PM
my further suspicion is that the data is parsed in exactly the same way so is perhaps encoded/represented as different.
7:56 PM
the precision of which is perhaps evaluated by the EXCEPT but not by the rendering process. is there a way of revealing the actual encoded data rather than decoding it?
Bolek Ziobrowski

Bolek Ziobrowski

02/14/2022, 8:36 PM
@Orrin Does it mean that subtractions returned 0.0 ? I think you can try two things :1. Adjust http.json.query.float.scale=4 http.json.query.double.scale=12 in server.conf and run queries again or 2. Fetch data through pg protocol and then inspect in java or python code .
o

Orrin

02/18/2022, 3:13 AM
both good ideas. thanks.