• o

    Orrin

    6 months ago
    I made a small test case for the EXCEPT keyword so that I can maintain unique property of a table by only inserting new rows. however, when I EXCEPT all fields & rows from two almost identical tables, I find that more rows are excepted than appears true or expected. in a 20k rows, i have a handful of unexpected and inexplicably excepted rows. I have narrowed it down to a single field in one such case as per the following query
    SELECT Volume FROM smallTable WHERE TradeDate = '2022-02-04T00:00:00.000000Z' AND RIC = 'SPXWb042245500.U' EXCEPT SELECT Volume FROM HJGIHL WHERE TradeDate = '2022-02-04T00:00:00.000000Z' AND RIC = 'SPXWb042245500.U';
    weirdly though, when I run each side of the EXCEPT independently, I see that the single value returns to be identical. is this a bug or am i missing some behavioral quirk?
  • NB. for the purposes of emulating my workflow, I have the HJGIHL table created from the upload of a CSV. my suspicion is that this is float approximation issue
  • after inserting the EXCEPT rows into the smallTable, when I run this,
    select * from smallTable WHERE TradeDate = '2022-02-04T00:00:00.000000Z' AND RIC = 'SPXWb042245500.U';
    i get a result that has apparently two identical rows.
  • message has been deleted
  • Bolek Ziobrowski

    Bolek Ziobrowski

    5 months ago
    Hello @Orrin. Can you share table definitions ?
  • I tried the following and got correct (empty) result :
    create table t1 (
        TradeDate timestamp,
        Open double,
        High double,
        Low double,
        Last double,
        Volume double,
        Bid double,
        Ask double,
        RIC string,
        SettlementPrice double
    );
    
    insert into t1 values
    (
        '2022-02-04T00:00:00.000000Z',11.6800,14.0000,0.0500,0.0500,56905.9968, null,0.0500,'SPXWb042245500.U', null
    );
    
    select * from t1;
    
    create table t2 (
        TradeDate timestamp,
        Open double,
        High double,
        Low double,
        Last double,
        Volume double,
        Bid double,
        Ask double,
        RIC string,
        SettlementPrice double
    );
    
    insert into t2 values
    (
        '2022-02-04T00:00:00.000000Z',11.6800,14.0000,0.0500,0.0500,56905.9968, null,0.0500,'SPXWb042245500.U', null
    );
    
    select * from t2;
    
    select * from t1 except select * from t2 ;
    Could you try subtracting these seemingly identical floating point values ? Something like :
    select t1.Open - t2.Open, 
    t1.High-t2.High, ... 
    from t1 join t2 on 1=1 ;
  • o

    Orrin

    5 months ago
    so to be clear, the two tables I’m EXCEPTing are very similar. >99% of the expected outcome occurs--it’s only a handful of weird ones. can you explain the purpose of that test?
  • Bolek Ziobrowski

    Bolek Ziobrowski

    5 months ago
    Hi @Orrin It's to make sure it's not about floating point quirks as you suspected before .
  • o

    Orrin

    5 months ago
    so the except and the subtractions both returned as expected.
  • my further suspicion is that the data is parsed in exactly the same way so is perhaps encoded/represented as different.
  • the precision of which is perhaps evaluated by the EXCEPT but not by the rendering process. is there a way of revealing the actual encoded data rather than decoding it?
  • Bolek Ziobrowski

    Bolek Ziobrowski

    5 months ago
    @Orrin Does it mean that subtractions returned 0.0 ? I think you can try two things :1. Adjust http.json.query.float.scale=4 http.json.query.double.scale=12 in server.conf and run queries again or 2. Fetch data through pg protocol and then inspect in java or python code .
  • o

    Orrin

    5 months ago
    both good ideas. thanks.