g

    gmaurice

    1 month ago
    Hello, since i upgraded from 6.4.3 to 6.5 i got sometimes this error :
    max txn-inflight limit reached
    which i didn’t get before. Any idea about the cause ? Another thing is i see memory that regularly grows with time even prior to 6.5. We can see on this screenshot that the memory growing has changed, in a way faster with the upgrade.
    If you see at the memory consumption oscillation, it’s before the error
    max txn-inflight limit reached
    occurs, after that, the memory stops to grow. Even if i can’t request data, data is written.
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    Hello, Are there any errors in the server logs?
    g

    gmaurice

    1 month ago
    Hello, I only get this error regularly :
    2022-08-17T09:29:17.089462Z E i.q.c.p.PGConnectionContext error [pos=57, msg=`[0]: max txn-inflight limit reached [txn=2621753, min=2605361, size=16384]`, errno=0]
    However, i have not enough logs in back to find the context of the first instance. I only use docker logs for now, i will switch to write logs on disk and i tell you back on the next occurrence.
    Alex Pelagenko

    Alex Pelagenko

    1 month ago
    max txn-inflight limit reached
    usually caused by one the queries holding table for too long, possibly because of resources leak.
    To diagnose you can run
    select * from reader_pool()
    find the table with positive owner and the time when it started
    if it looks very long time ago, please find the log section for the time period to identify the query and let us know what it is
    do you restart QuestDB or why uptime is getting shorter?
    g

    gmaurice

    1 month ago
    uptime is getting back to zero when i restarts it, yes.
    Alex Pelagenko

    Alex Pelagenko

    1 month ago
    what’s the reason for restart?
    g

    gmaurice

    1 month ago
    Because of the error, questdb stopped to answer to read queries.
    However, i don’t remember why i restarted questdb before the upgrade (time of the annotation)
    On reader_pool, how can i found the effective reader with the integer given as the owner ?
    Alex Pelagenko

    Alex Pelagenko

    1 month ago
    there is no such query yet, the useful part is
    timestamp
    by this you may be able to find the query in the logs
    I’d expect it would be a failing query
    failing a sense that it did not return the data but was an error
    but not necessary, can it a successful query
    g

    gmaurice

    1 month ago
    Yes, i found a very long one. The owner is corresponding to a sort of connection ?
    Ok, with the
    timestamp
    , i found an error :
    2022-08-17T19:05:49.805401Z E i.q.c.p.PGWireServer
    java.lang.StringIndexOutOfBoundsException: String index out of range: 8
            at java.lang.StringLatin1.charAt(StringLatin1.java:48)
            at java.lang.String.charAt(String.java:1515)
            at java.lang.Character.codePointAt(Character.java:8910)
            at java.util.regex.Pattern$CharPropertyGreedy.match(Pattern.java:4273)
            at java.util.regex.Pattern$Start.match(Pattern.java:3608)
            at java.util.regex.Matcher.search(Matcher.java:1728)
            at java.util.regex.Matcher.find(Matcher.java:745)
            at io.questdb.griffin.engine.functions.regex.MatchStrFunctionFactory$MatchConstPatternFunction.getBool(MatchStrFunctionFactory.java:100)
            at io.questdb.griffin.engine.table.AsyncFilteredRecordCursorFactory.filter(AsyncFilteredRecordCursorFactory.java:167)
            at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:175)
            at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:132)
            at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:106)
            at io.questdb.cairo.sql.async.PageFrameSequence.stealWork(PageFrameSequence.java:390)
            at io.questdb.cairo.sql.async.PageFrameSequence.dispatch(PageFrameSequence.java:369)
            at io.questdb.cairo.sql.async.PageFrameSequence.next(PageFrameSequence.java:294)
            at io.questdb.griffin.engine.table.AsyncFilteredRecordCursor.fetchNextFrame(AsyncFilteredRecordCursor.java:184)
            at io.questdb.griffin.engine.table.AsyncFilteredRecordCursor.of(AsyncFilteredRecordCursor.java:226)
            at io.questdb.griffin.engine.table.AsyncFilteredRecordCursorFactory.getCursor(AsyncFilteredRecordCursorFactory.java:129)
            at io.questdb.griffin.engine.table.SelectedRecordCursorFactory.getCursor(SelectedRecordCursorFactory.java:58)
            at io.questdb.griffin.engine.groupby.SampleByFillNoneRecordCursorFactory.getCursor(SampleByFillNoneRecordCursorFactory.java:97)
            at io.questdb.cutlass.pgwire.PGConnectionContext.setupFactoryAndCursor(PGConnectionContext.java:2488)
            at io.questdb.cutlass.pgwire.PGConnectionContext$PGConnectionBatchCallback.postCompile(PGConnectionContext.java:2593)
            at io.questdb.griffin.SqlCompiler.compileBatch(SqlCompiler.java:930)
            at io.questdb.cutlass.pgwire.PGConnectionContext.processQuery(PGConnectionContext.java:2256)
            at io.questdb.cutlass.pgwire.PGConnectionContext.parse(PGConnectionContext.java:1550)
            at io.questdb.cutlass.pgwire.PGConnectionContext.handleClientOperation(PGConnectionContext.java:415)
            at io.questdb.cutlass.pgwire.PGJobContext.handleClientOperation(PGJobContext.java:81)
            at io.questdb.cutlass.pgwire.PGWireServer$1.lambda$$0(PGWireServer.java:81)
            at io.questdb.network.AbstractIODispatcher.processIOQueue(AbstractIODispatcher.java:166)
            at io.questdb.cutlass.pgwire.PGWireServer$1.run(PGWireServer.java:106)
            at io.questdb.mp.Worker.run(Worker.java:116)
    I will fix it. However, why it’s still an active query ?
    Alex Pelagenko

    Alex Pelagenko

    1 month ago
    no, not really, query would be before that, something using regex
    g

    gmaurice

    1 month ago
    Yes, it’s a query with a regex 😉
    Alex Pelagenko

    Alex Pelagenko

    1 month ago
    can you share the query itself? would be useful to fix it
    it’s probably a leak, that’s why it’s still active
    g

    gmaurice

    1 month ago
    For sure :
    SELECT
      timestamp as time,
      exchange,
      symbol,
      count() as trades
    FROM
      table
    WHERE
      symbol ~ '.*-?ETH' and
      timestamp BETWEEN '2022-08-17T13:05:43.981Z' AND '2022-08-17T19:05:43.981Z'
    SAMPLE BY 1m
    this query is created and executed by grafana
    Alex Pelagenko

    Alex Pelagenko

    1 month ago
    thanks, here is bug if you want to track https://github.com/questdb/questdb/issues/2441
    g

    gmaurice

    1 month ago
    Thank you, subscribed.
    Nicolas Hourcard

    Nicolas Hourcard

    1 month ago
    Hi @gmaurice , would you be able to take us through your use case at a glance? thanks a lot
    thanks
    g

    gmaurice

    1 month ago
    After having restarted questdb with the right log configuration, i missed to restart my data ingester 😄 After a while, i restarted them (about 1,5 millions trades) and questdb has crashed a couple of minutes after, with this critical error :
    2022-08-18T20:17:49.500012Z C server-main unhandled error [job=io.questdb.cairo.sql.async.PageFrameReduceJob@4721d212, ex=
    java.lang.NullPointerException: Cannot invoke "io.questdb.cairo.sql.Function.getBool(io.questdb.cairo.sql.Record)" because "filter" is null
            at io.questdb.griffin.engine.table.AsyncFilteredRecordCursorFactory.filter(AsyncFilteredRecordCursorFactory.java:167)
            at io.questdb.cairo.sql.async.PageFrameReduceJob.reduce(PageFrameReduceJob.java:175)
            at io.questdb.cairo.sql.async.PageFrameReduceJob.consumeQueue(PageFrameReduceJob.java:132)
            at io.questdb.cairo.sql.async.PageFrameReduceJob.run(PageFrameReduceJob.java:194)
            at io.questdb.mp.Worker.run(Worker.java:116)
    ]
    the line just before mentioned the same
    java.lang.StringIndexOutOfBoundsException: String index out of range: 8
    we talked about. I don't know if it's related.
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    @gmaurice could be related. BTW do you have the
    pg.worker.count
    config property set to a non-default value?
    g

    gmaurice

    1 month ago
    No, i didn’t change the default. However, how can i found the default
    server.conf
    file ? Because i used the one i used for 6.4.3. Maybe some things have changed between versions.
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    There should be no breaking changes in the default config file
    It should be located in
    <qdb_root_dir>/conf
    g

    gmaurice

    1 month ago
    Thanks
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    @gmaurice we have fixed the leak and it should be shipped in the next patch release. In the meantime, could you share the string column value on which you were getting StringIndexOutOfBoundsException? Knowing your jdk version is also important since this exception looks a lot like a bug in the standard Java library.
    g

    gmaurice

    1 month ago
    🙏 You mean jdk version of the client ?
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    No, the server one
    Or maybe you're using the one we ship with the rt version of QuestDB?
    g

    gmaurice

    1 month ago
    Yes, exactly, i’m using the docker image you provide
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    And do you happen to know which string value led to this exception?
    g

    gmaurice

    1 month ago
    You mean the one which matches with regex
    .*-?ETH
    ?
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    No, the one that led to the exception
    g

    gmaurice

    1 month ago
    So you talk about the query string ? If not, i’m unable to find the one you want.
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    No, about the string (column value) on which the regex matcher was throwing the exception
    g

    gmaurice

    1 month ago
    Ok understood, i can find it in logs ? I didn’t see it for now. Else, maybe i can do a distinct on the column but i will not be able to point you the one which led to the exception.
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    All distinct values would be just fine
    g

    gmaurice

    1 month ago
    Ok the next time i got the exception i’ll do that, should be soon 😉
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    Hopefully, this doesn't happen, but if it does, you know what to do 🙂
    g

    gmaurice

    1 month ago
    Hello @Andrey Pechkurov, i got the exception and here are the distinct values on
    symbol
    column :
    "ALT-USD-22U30"
          "BTC-USD"
          "BTC-USD-22U30"
          "BTC-USD-PERP"
          "BTC-USDT"
          "CEL-USD"
          "CEL-USD-22U30"
          "CEL-USD-PERP"
          "ETH-USD"
          "ETH-USD-PERP"
          "ETH-USDT"
          "EXCH-USD-22U30"
          "MID-USD-22U30"
          "PRIV-USD-22U30"
          "SHIT-USD-22U30"
          "SOL-USD"
          "SOL-USD-22U30"
          "SOL-USD-PERP"
          "SOL-USDT"
          "USDT-USD"
          "USDT-USD-22U30"
          "USDT-USD-PERP"
    Andrey Pechkurov

    Andrey Pechkurov

    1 month ago
    Hello, Many thanks for the info! I'll try to reproduce it.
    Tried to reproduce it on both 6.5 Docker and questdb-6.5-rt-linux-amd64 and failed to do so. No exception whatsoever 😞
    Awesome to hear that!
    g

    gmaurice

    3 weeks ago
    You’re welcome, you’re doing great stuff, you deserve it 🙂