j

    Jack

    2 weeks ago
    Morning - just had an interesting one. Publishing a lot of out-of-order data into quest, ended up with this error: ex=could not mmap [size=32792, offset=0, fd=176117, memUsed=3537617303984, fileLen=32792], errno=12] Can see the memory_mem_used shot up to 3.2tb. Had to restart quest to resolve, would appreciate any assistance on what to look for here.
    Box has 128gig memory, 18gig of swap
    any from ulimit: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 513943 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 662144 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 513943 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
    looking at the logs, I can see it opening partitions multiple times, in the 1 second window prior to the spike, it logged the "open partition .." 30k times
    Bolek Ziobrowski

    Bolek Ziobrowski

    2 weeks ago
    j

    Jack

    2 weeks ago
    so for OS limits, I had already increased both max file and max mmap
    the max_map_count is now at 624k... which is pretty big
    Bolek Ziobrowski

    Bolek Ziobrowski

    2 weeks ago
    Why not increase it further ?
    j

    Jack

    2 weeks ago
    Im more than happy to - but that is 3x the suggested in your docs, just want to make sure it's not an issue elsewhere
    interesting this is only happening to 1 out of 2 instances, and they both have the same sort of publication scenarios
    Bolek Ziobrowski

    Bolek Ziobrowski

    2 weeks ago
    If you're doing lots of O3 and queries at the same time then qdb might need to create and maintain many versions of the same partition and maintain them . If possible it'd be good to tune commitLag and maxUncommitedRows to limit partition merges (especially if data doesn't only go to latest partition) .
    j

    Jack

    2 weeks ago
    just watching it now, can see it hit the proc limit
    let me increase
    Bolek Ziobrowski

    Bolek Ziobrowski

    2 weeks ago
    1vs2 instance - they could be getting data in different order or at different times or there could be more parallel queries executed against one of them .
    j

    Jack

    2 weeks ago
    understood - I've 10x'd the max_map_count and will continue to monitor - thanks!
    and yes you are correct, it has a lot of both o3 and reads going on right now
    Nicolas Hourcard

    Nicolas Hourcard

    2 weeks ago
    Jack - if you feel that our doc could be clearer, please let us know
    j

    Jack

    2 weeks ago
    I think it's fine - I had already followed this doc to increase them previously. Might be worth just indicating that o3+reads will have an impact on the number of mmap files. Thanks!
    mechanical sympathy! I should have known this before tbh.
    Amy Wang

    Amy Wang

    2 weeks ago
    Might be worth just indicating that o3+reads will have an impact on the number of mmap files.
    Thanks for the feedback. Let me see if I can quickly add this 🙂
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    how many files / partitions are you creating on disk?
    j

    Jack

    2 weeks ago
    daily partitions, there are currently about 1500-1600 (assume some o3 is in process so multiple folders per day)
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    can you count files in questdb root folder?
    j

    Jack

    2 weeks ago
    23 in the db folder, 1664 in this particular folder
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    and how many columns in each partition?
    j

    Jack

    2 weeks ago
    about 7
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    not that bad
    are you sure the limits are applied to the process?
    j

    Jack

    2 weeks ago
    yeh I was watching the count on /proc/<fd>/maps during this
    and could see it hit the 624k - and then crash
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    to reach 624k you’d need to query the table 62 times fully (each would use 10k fd(s))
    in parallel
    is it what you’re trying to do or query volume is never that high?
    j

    Jack

    2 weeks ago
    there were about 30k entries in the log of "open partition"
    at the time, I had a process publishing a lot of o3 data, i.e. hourly data for over 1 year
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    you’d better to use import csv for historic unordered data
    j

    Jack

    2 weeks ago
    unfortunately not that straight forward as the process dealing with live, also facilitates historical when required
    Alex Pelagenko

    Alex Pelagenko

    2 weeks ago
    would be much much faster
    ok
    j

    Jack

    2 weeks ago
    its rare that I do this sort of "seed" though