https://questdb.io logo
Title
p

Paweł Wangryn (Vangreen)

12/07/2022, 3:24 PM
Hi, I have annoying issue. I have collection with partitioning by hour. After some time I encounter this error message while try to do literally anything (even simple select * from xxx)
ERROR: [0]: Partition '2022-12-07T13.31425' does not exist in table 'packets' directory. Run [ALTER TABLE packets DROP PARTITION LIST '2022-12-07T13.31425'] to repair the table or restore the partition directory
But I don't have partition of this name
drwxrwxr-x 2 root root    4096 Dec  7 12:00 2022-12-07T11.10976/
drwxrwxr-x 2 root root    4096 Dec  7 14:00 2022-12-07T13.24223/
drwxrwxr-x 2 root root    4096 Dec  7 15:00 2022-12-07T14.31425/
drwxrwxr-x 2 root root    4096 Dec  7 16:00 2022-12-07T15.38536/
there is 13.24223 not 13.31425 but there is also partition called 2022-12-07T14.31425 . Is this some kind of bug?
a

Andrey Pechkurov

12/07/2022, 3:37 PM
Hello, Yes, it's a known issue. You should run
ALTER TABLE packets DROP PARTITION LIST '2022-12-07T13'
The bug report to track is this one: https://github.com/questdb/questdb/issues/2619
p

Paweł Wangryn (Vangreen)

12/07/2022, 3:43 PM
Ok so hint is wrong, but what about issue that cause this error?
a

Andrey Pechkurov

12/07/2022, 3:51 PM
Usually it's something like
kill -9
or sudden OS shutdown
If you think it wasn't the case for you, having any details on the reproducer would help a lot
p

Paweł Wangryn (Vangreen)

12/07/2022, 3:55 PM
I send data with ILP client and every hour scheduler drop partition older than X (x depends from user). Maybe there is issue when I try to drop partiton when ILP client send data?
a

Andrey Pechkurov

12/07/2022, 4:42 PM
This shouldn't be an issue and we have tests around that, but there is always a chance that there is a bug. Thanks for sharing this
Hello, FYI the misleading hint in the error message will be fixed in the next QuestDB version. The GH issue to track is this one: https://github.com/questdb/questdb/pull/2851
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:14 PM
@Andrey Pechkurov I found what cause problem. In every 30 seconds I run this query to group network connections. And after execute this query errors starts to show up. But I don't know why. Query run in web console without problem 😕
SELECT src, dst, min(to_timezone(timestamp,'Europe/Warsaw'))  AS firstPacketDate, max(to_timezone(timestamp,'Europe/Warsaw'))  AS lastPacketDate, count(*) FROM 'packets' WHERE timestamp > to_timezone(1670500500009050,'UTC') GROUP BY src, dst;
Oh and problem shows up round hours (for example 08-12-2022 12:00:00.504, 08-12-2022 13:00:00.494)
a

Andrey Pechkurov

12/08/2022, 12:30 PM
Could you check server logs for any error messages (they contain
E
text)?
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:34 PM
a

Andrey Pechkurov

12/08/2022, 12:41 PM
Are you sending string values without trailing quote over ILP? I've noticed lots of messages like this one:
i.q.c.l.t.LineTcpConnectionContext [84] could not parse measurement, INVALID_FIELD_VALUE_STR_UNDERFLOW at 287
As for the
partition does not exist on the disk
error, could you check the logs before this problem occurs? Once more question: are you able to reproduce the issue reliably?
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:51 PM
I send you log from start of database ( fresh installation). Yes I can reproduce this error on ours test environments (ubuntu server 20.02, quest run via docker, ext4 file system nvme drive, Intel Xeon E5-2680 v4 (56) @ 3.300GHz, 512GB ram) . And its starts to show after about 2-3 hours of saving data (30_000 documents per second via ILP). my docker compose
version: '3.7'

services:
  questdb:
    image: questdb/questdb
    container_name: questdb
    restart: always
    ports:
      - 9000:9000
      - 9009:9009
      - 8812:8812
      - 9003:9003
    volumes:
      - /mnt/ext4/quest:/root/.questdb
    environment:
      - QDB_LOG_W_STDOUT_LEVEL=ERROR
      - QDB_LOG_W_FILE_LEVEL=ERROR
      - QDB_LOG_W_HTTP_MIN_LEVEL=ERROR
      - QDB_SHARED_WORKER_COUNT=20 # Amount of worker threads
      - QDB_TELEMETRY_ENABLED=false # Disable telemetry
a

Andrey Pechkurov

12/08/2022, 12:53 PM
No server restarts in between 2022-12-08T08:11:07 and 2022-12-08T08:11:31?
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:53 PM
nope
all microservices runs localy on this machine
a

Andrey Pechkurov

12/08/2022, 12:54 PM
Did you
grep " E "
the logs?
The server started at 08:00am and the first "partition does not exist on the disk" error is at 11:00am
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:55 PM
yes that correct
I start test environemnt at 8 am
and error occurred on about 11 am
a

Andrey Pechkurov

12/08/2022, 12:56 PM
QDB_LOG_W_STDOUT_LEVEL=ERROR
should be
QDB_LOG_W_STDOUT_LEVEL=ERROR,INFO,WARN
- otherwise you loose all INFO and WARN level logs
That's why I don't see much in the logs
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:56 PM
Ok, I will enable it and try to reproduce error
a

Andrey Pechkurov

12/08/2022, 12:57 PM
Thanks!
Could you also tell me if you're using one of our ILP clients or you're using a DYI client?
Those "i.q.c.l.t.LineTcpConnectionContext [84] could not parse measurement, INVALID_FIELD_VALUE_STR_UNDERFLOW" don't look ok
p

Paweł Wangryn (Vangreen)

12/08/2022, 12:59 PM
I send data like this
sender.table("test")
                    .symbol("a", object.getA())
                    .stringColumn("b", object.getB())
                    .longColumn("c", object.getC())
                    .atNow();
I use official questDB ILP sender java lib
a

Andrey Pechkurov

12/08/2022, 1:01 PM
Thanks
@Jaromir Hamala any ideas on why INVALID_FIELD_VALUE_STR_UNDERFLOW might occur with the official ILP Java client?
j

Jaromir Hamala

12/08/2022, 1:15 PM
@Andrey Pechkurov not by hearts.
a wild guess: an empty string?
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:18 PM
yep, I have a few documents with empty string
a

Andrey Pechkurov

12/08/2022, 1:19 PM
2022-12-08T11:00:14.996725Z E i.q.c.l.t.LineTcpConnectionContext [84] could not parse measurement, INVALID_FIELD_VALUE_STR_UNDERFLOW at 287, line (may be mangled due to partial parsing): 'packets,probeId=638f3fb33130940845fc31e5,src=39,dst=38 id="b6fdbe77-7ebf-4b45-91a3-f4dd6a62a9c1",observationTimestamp=1670497153668593i,payload="short",srcPort="57806",srcIp="192.168.100.112",dstPort="8000",dstIp="192.168.100.108",errorCode="0,0",protocol=7905i,identificationPort="8000"'
position 287 is before the end quote of
identificationPort="8000"
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:22 PM
yes protocol is long column
a

Andrey Pechkurov

12/08/2022, 1:23 PM
protocol=7905i
looks fine
but for some reason the database complains about
identificationPort="8000"
which is odd
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:24 PM
oh ok
a

Andrey Pechkurov

12/08/2022, 1:24 PM
is
identificationPort
a string column?
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:24 PM
yes
a

Andrey Pechkurov

12/08/2022, 1:28 PM
hmm, this error is weird
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:29 PM
I have some records with empty string on identificationPort
a

Andrey Pechkurov

12/08/2022, 1:29 PM
would be great to get a full log once you reproduce the issue
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:29 PM
I'm working on it
j

Jaromir Hamala

12/08/2022, 1:43 PM
do you use only ascii in strings or do you happen to use also some non-unicode characters?
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:44 PM
only asci
j

Jaromir Hamala

12/08/2022, 1:45 PM
any non-printable chars? like a new line or similar? can your strings contain double-quotes?
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:45 PM
nope
j

Jaromir Hamala

12/08/2022, 1:46 PM
ok, so everything is simple. the only slight deviation from the “happy path” could be empty strings.
p

Paweł Wangryn (Vangreen)

12/08/2022, 1:47 PM
Ok, but I should avoid to send empty string or this is problem in Database?
j

Jaromir Hamala

12/08/2022, 1:50 PM
no, no need. it’s tested. we know it works. it’s designed to work. I am just thinking outloud: what is potentially unusual? searching for any kind of a deviation from the absolutely simplest scenario. if there is a bug then it’s usually in some unusual/edge cases. and they tend to add up. that’s all.
p

Paweł Wangryn (Vangreen)

12/08/2022, 2:04 PM
@Andrey Pechkurov once again problem starts on round hour (2PM)
a

Andrey Pechkurov

12/08/2022, 2:08 PM
Thanks. I'll take a look a bit later
I send data with ILP client and every hour scheduler drop partition older than X (x depends from user).
How exactly do you drop the older partitions? I don't see any ALTER TABLE t DROP PARTITION statements before the troublemaker error line ("Partition '2022-12-08T13.38195' does not exist in table 'packets' directory").
p

Paweł Wangryn (Vangreen)

12/08/2022, 3:12 PM
ALTER TABLE 'packets' DROP PARTITION WHERE timestamp < dateadd('h', -1, now())
a

Andrey Pechkurov

12/08/2022, 3:13 PM
I don't see such queries in the log
Is that a fresh database?
p

Paweł Wangryn (Vangreen)

12/08/2022, 3:13 PM
yes
And I'm sure this query execute because collection is smaller every hour
@Andrey Pechkurov I know why its not on logs, because its execute every hour from application startup and I send you logs till 14:03
/> docker logs questdb | grep dateadd
2022-12-08T14:33:45.209627Z I i.q.c.p.PGConnectionContext parse [fd=22, q=ALTER TABLE 'packets' DROP PARTITION WHERE timestamp < dateadd('h', -1, now())]
a

Andrey Pechkurov

12/08/2022, 3:19 PM
The logs you've shared end at
2022-12-08T14:03:00.522312Z
p

Paweł Wangryn (Vangreen)

12/08/2022, 3:19 PM
I think there is something
a

Andrey Pechkurov

12/08/2022, 3:19 PM
And they contain the partition missing message
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
that's something new
p

Paweł Wangryn (Vangreen)

12/08/2022, 3:22 PM
That sounds bad 🤔
a

Andrey Pechkurov

12/08/2022, 3:22 PM
This one happens when the database tries to initialize the table reader
Are you sure you had no data before this server was started? Also, it might be a good idea to check the disk for errors
One more thing to double check: no other processes should be deleting database files and directories. Even when QuestDB is shut down
p

Paweł Wangryn (Vangreen)

12/08/2022, 3:25 PM
yest this is my bash history from todays fresh database start Yesterday I also remove image from disk before launch
I have questdb dir on separate disk mounted today
no procces should have acces to this files
Same error just happend to my co worker, maybe there is problem in my configuration or technology used?. Like I said I have one microservice that read data from kafka, analize it and send via ILP to quest. And Two also java apps which read data from quest via postgresql wire protocol using jdbi library (https://jdbi.org/jdbi2/sql_object_api_queries/)
Is not a problem that partition is droped and in the same time is execute query?
2022-12-08T14:33:45.209627Z I i.q.c.p.PGConnectionContext parse [fd=22, q=ALTER TABLE 'packets' DROP PARTITION WHERE timestamp < dateadd('h', -1, now())]
2022-12-08T14:33:45.209628Z I i.q.c.p.PGConnectionContext parse [fd=24, q=SELECT id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') as timestamp, observationTimestamp FROM 'packets' WHERE timestamp > to_timezone(1670506710087856,'UTC')]
a

Andrey Pechkurov

12/08/2022, 6:12 PM
No, concurrent queries are fine. We don't immediately delete the partition when someone is reading from it. We have a GC-like logic for partitions, so that they're deleted eventually. Concurrent ILP ingestion and queries should be fine.
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
This particular error can't be explained by a deleted partition directory. Symbol files are stored in the table directory, not under partition dirs. That's why I found it really weird
I'll try to reproduce this issue manually on my machine, but it would help a lot if you could come up with a minimal reproducer
p

Paweł Wangryn (Vangreen)

12/08/2022, 7:33 PM
Sure, i will try mock code tommorow and send you source code. Database works perfect for our usecase and we want to use it in production, but this error ruins everything. So I’m ready to help.
a

Andrey Pechkurov

12/08/2022, 7:40 PM
That's awesome. Let's get to the bottom of this issue!
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:12 PM
Hi I've made simple poc apps with same code as in our app, but I can't reproduce error. Only difference is the ilp sender on production is write in reactive java, on my poc not. So for now I will try to find the cause on my side, and If i found solution (or not) I will come back to you.
But once again I have a lot of this errors in logs, and now I dont have any empty strings
questdb    | 2022-12-09T11:54:54.918987Z E i.q.c.l.t.LineTcpConnectionContext [22] could not parse measurement, NONE at 319, line (may be mangled due to partial parsing): 'packets,probeId=d09a20a3-5570-4074-8569-5f09ac7d3cc8,src=src,dst=123 id="422e6ef8-711e-421d-87e0-333b457819c8",observationTimestamp=1670579708524i,payload="/////payooaaaddddddddddddddddddddddddd",srcPort="1234",srcIp="10.0.0.1",dstPort="321",dstIp="129.168.0.1",errorCode="0,0",protocol=12312i,identificationPort="1234"'
j

Jaromir Hamala

12/09/2022, 1:15 PM
you say the production client is written in a reactive Java. are you sure access to Sender instances is synchronized? Senders are not thread safe, calling them from multiple threads can produce invalid ILP messages.
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:17 PM
I will check it, thanks
a

Andrey Pechkurov

12/09/2022, 1:20 PM
But once again I have a lot of this errors in logs, and now I dont have any empty strings
I've checked this particular ILP message and it's a valid one. Indeed, the issue may be related with concurrent access to the ILP sender leading to interleaved ILP messages
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:22 PM
Yes, but this log is made by this non reactive code
public void run(ApplicationArguments args) throws Exception {
        sender = Sender.builder().address(QUEST_DB_URL).build();
        List<Packet> packets = new ArrayList<>();
        for (int i = 0; i < 30_000; i++) {
            packets.add(new Packet(UUID.randomUUID().toString(), "123", "321", "129.168.0.1", "0,0", "1234", Instant.now().toEpochMilli(), "/////payload", 12312L, "src", "1234", "10.0.0.1", "PROBE_ID"));
        }
        while (true) {

            Thread.sleep(1_000);
            packets.forEach(this::createPacketRecord);
            System.out.println(packets.size());
            System.out.println(LocalDateTime.now());
            sender.flush();
        }
    }
a

Andrey Pechkurov

12/09/2022, 1:28 PM
Is it possible to share full code of this sender app?
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:29 PM
package com.example.questilpsenderdemo;

import io.questdb.client.Sender;
import io.questdb.cutlass.line.LineSenderException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.stereotype.Component;

import java.time.Instant;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;

@Component
public class AppStartupRunner implements ApplicationRunner {

    private final Logger logger = LoggerFactory.getLogger(AppStartupRunner.class);

    private Sender sender;

    private static final String QUEST_DB_URL = "questdb:9009";


    @Override
    public void run(ApplicationArguments args) throws Exception {
        sender = Sender.builder().address(QUEST_DB_URL).build();
        List<Packet> packets = new ArrayList<>();
        for (int i = 0; i < 30_000; i++) {
            packets.add(new Packet(UUID.randomUUID().toString(), "123", "321", "129.168.0.1", "0,0", "1234", Instant.now().toEpochMilli(), "/////payload", 12312L, "src", "1234", "10.0.0.1", "PROBE_ID"));
        }
        while (true) {

            Thread.sleep(1_000);
            packets.forEach(this::createPacketRecord);
            System.out.println(packets.size());
            System.out.println(LocalDateTime.now());
            sender.flush();
        }
    }

    private void createPacketRecord(Packet packet) {

        try {
            sender.table("packets")
                    .symbol("probeId", packet.getProbeId())
                    .symbol("src", packet.getSrc())
                    .symbol("dst", packet.getDst())
                    .stringColumn("id", packet.getId())
                    .longColumn("observationTimestamp", packet.getObservationTimestamp())
                    .stringColumn("payload", packet.getPayload())
                    .stringColumn("srcPort", packet.getSrcPort())
                    .stringColumn("srcIp", packet.getSrcIp())
                    .stringColumn("dstPort", packet.getDstPort())
                    .stringColumn("dstIp", packet.getDstIp())
                    .stringColumn("errorCode", packet.getErrorCode())
                    .longColumn("protocol", packet.getProtocol())
                    .stringColumn("identificationPort", packet.getIdentificationPort())
                    .atNow();
            // QuestDB ILP client reconnect workaround
        } catch (LineSenderException senderException) {
            logger.error("QuestDB connection lost \n Error message: {}", senderException.getMessage());
            try {
                sender.close();
                this.sender = createQuestDBSender();
            } catch (Exception ex) {
                logger.error("Try reconnect to questDB");
            }
        }
    }

    private Sender createQuestDBSender() {
        return Sender.builder().address(QUEST_DB_URL).build();
    }
}
a

Andrey Pechkurov

12/09/2022, 1:30 PM
where is the
Packet
class?
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:30 PM
Oh sorry
a

Andrey Pechkurov

12/09/2022, 1:30 PM
I guess it's a simple POJO
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:30 PM
I can send you a zip file of project if you dont mind?
a

Andrey Pechkurov

12/09/2022, 1:31 PM
sure
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:32 PM
a

Andrey Pechkurov

12/09/2022, 1:40 PM
Should I keep it running for some time to see the error message?
Or maybe I should run multiple instances of it in parallel?
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:42 PM
No one instance, and first errors show up after one hour
a

Andrey Pechkurov

12/09/2022, 1:43 PM
ok, I hope that my SSD is large enough to fit 1h of data 🙂
p

Paweł Wangryn (Vangreen)

12/09/2022, 1:44 PM
33G ./db/packets/2022-12-09T12.9661 On hour only take 33GB
a

Andrey Pechkurov

12/09/2022, 1:44 PM
ok, I have the space then
1h isn't enough on my machine - no errors on 6.6.1. I'm also on Linux + ext4. I'll keep it running for a longer period of time later today
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:09 PM
@Andrey Pechkurov I think found issue. Like I said every hour partition from last X hours is dropped. And after this drop something wierd happends Look at this two screenshots. Numbers of rows are increasing but newest record is the same
I send packets with ilp
atnow
So every record should show up with newest timestamp
wtf?
j

Jaromir Hamala

12/12/2022, 1:24 PM
that’s super weird. if you manage to come-up with a reproducer then you’ll be my hero! 🙂
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:26 PM
I also notice that when I restart ilp app everything is going back to normal
a

Andrey Pechkurov

12/12/2022, 1:27 PM
Really weird indeed. Do you use PARTITION BY HOUR?
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:27 PM
yes
CREATE TABLE IF NOT EXISTS  packets(timestamp TIMESTAMP, probeId SYMBOL, src SYMBOL, dst SYMBOL, id STRING,  observationTimestamp LONG, payload STRING, srcPort STRING, srcIp STRING, dstPort STRING, dstIp STRING, errorCode STRING, protocol LONG, identificationPort STRING) timestamp(timestamp)
PARTITION BY HOUR;
This is query
a

Andrey Pechkurov

12/12/2022, 1:27 PM
ALTER TABLE 'packets' DROP PARTITION WHERE timestamp < dateadd('h', -1, now())
The above is the SQL statement you run to drop the old partitions, right?
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:27 PM
right
a

Andrey Pechkurov

12/12/2022, 1:28 PM
How often do you run it? Once per hour?
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:29 PM
It's depends from user, but minimum is one per hour and for tests I run it once per hour
a

Andrey Pechkurov

12/12/2022, 1:31 PM
If you run it at the beginning of an hour, it may drop the active partition. That shouldn't lead to any problems unless there is a bug in the active partition drop code.
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:32 PM
I also have this problem with 2+ hours
a

Andrey Pechkurov

12/12/2022, 1:32 PM
You mean
ALTER TABLE 'packets' DROP PARTITION WHERE timestamp < dateadd('h', -2, now())
?
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:32 PM
yep
a

Andrey Pechkurov

12/12/2022, 1:33 PM
OK, then it's probably unrelated with the active partition.
@Alex Pelagenko @Miguel Arregui guys, do you have any ideas on what might go wrong here? Please refer to the first message in the thread and the last few messages.
p

Paweł Wangryn (Vangreen)

12/12/2022, 1:58 PM
Another think. Now when I start to send records via ILP (even with non reactive version) I cant execute any query on database with same error.
Partition '2022-12-12T12.21877' does not exist in table 'packets' directory. Run [ALTER TABLE packets DROP PARTITION LIST '2022-12-12T12.21877'] to repair the table or restore the partition directory.
When I stop applicaiton everything back to normal
And same think is happening number of records is growing, but newest record is this one just before drop of partition
a

Alex Pelagenko

12/12/2022, 2:04 PM
can you sum up what’s going on? I see error message and
Alter .. drop
sql. What is questdb version, os?
a

Andrey Pechkurov

12/12/2022, 2:04 PM
6.6.1, Linux, ext4
a

Alex Pelagenko

12/12/2022, 2:04 PM
is there a workaround atm?
p

Paweł Wangryn (Vangreen)

12/12/2022, 2:05 PM
for now, manualy stop ilp clien app, manualy drop partition and start app again
so there is no reasonable workaround
a

Alex Pelagenko

12/12/2022, 2:06 PM
what happens if you drop partition without stopping ilp?
p

Paweł Wangryn (Vangreen)

12/12/2022, 2:06 PM
I cant execute eny query
any*
When I try to execute query (even simle select * from xxx this error show up
Partition '2022-12-12T12.21877' does not exist in table 'packets' directory. Run [ALTER TABLE packets DROP PARTITION LIST '2022-12-12T12.21877'] to repair the table or restore the partition directory.
a

Alex Pelagenko

12/12/2022, 2:07 PM
what happens if you drop partition without stopping ilp?
p

Paweł Wangryn (Vangreen)

12/12/2022, 2:07 PM
I cant drop it becaouse of this error without stopping ilp
when I force clink run button on browser client ,one in about 50 times is succesfull
and when I manage to drop partition everything goes back to normal
a

Alex Pelagenko

12/12/2022, 2:10 PM
Same error as with select? Can you share log file for the hour¿
If you write with atNow, log from 11 UTC today
p

Paweł Wangryn (Vangreen)

12/12/2022, 2:11 PM
Yes, this is log file just before bug shows up
2022-12-12T10:38:39.408405Z I i.q.g.SqlCompiler plan [q=`select-virtual id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') timestamp, observationTimestamp from (select [id, src, dst, srcPort, dstPort, probeId, protocol, timestamp, observationTimestamp] from packets timestamp (timestamp) where timestamp > to_timezone(1670841513994010,'UTC'))`, fd=220]
2022-12-12T10:38:39.427558Z I i.q.c.p.PGConnectionContext parse [fd=246, q=ALTER TABLE 'packets' DROP PARTITION WHERE timestamp < dateadd('h', -1, now())]
2022-12-12T10:38:39.428221Z I i.q.c.p.AbstractMultiTenantPool open 'packets' [at=0:6]
2022-12-12T10:38:39.429089Z I i.q.c.p.AbstractMultiTenantPool open 'packets' [at=0:1]
2022-12-12T10:38:39.432472Z I i.q.c.p.WriterPool busy [table=`packets`, owner=53, thread=36, reason=tcpIlp]
2022-12-12T10:38:39.440766Z I i.q.g.e.o.OperationFutureImpl published ASYNC writer command [name=ALTER TABLE,tableName=packets,instance=1]
2022-12-12T10:38:39.441047Z I i.q.g.e.o.OperationFutureImpl writer command QUERY_STARTED response received [instance=1]
2022-12-12T10:38:39.441087Z I i.q.c.TableWriter received async cmd [type=2, tableName=packets, tableId=3, correlationId=1, cursor=0]
2022-12-12T10:38:39.443256Z I i.q.c.TableWriter scheduled to purge partitions [table=packets]
2022-12-12T10:38:39.443291Z I i.q.g.e.o.OperationFutureImpl writer command response received [instance=1]
2022-12-12T10:38:39.443318Z I i.q.c.TableWriter published async command complete event [type=2,tableName=packets,tableId=3,correlationId=1]
2022-12-12T10:38:39.443525Z I i.q.c.O3PartitionPurgeJob processing [table=packets]
2022-12-12T10:38:39.443956Z I pg-server scheduling disconnect [fd=246, reason=15]
2022-12-12T10:38:39.443973Z I pg-server disconnected [ip=172.21.0.35, fd=246, src=queue]
2022-12-12T10:38:39.444382Z I i.q.c.O3PartitionPurgeJob cannot purge partition directory, locked for reading [path=/root/.questdb/db/packets/2022-12-12T09]
2022-12-12T10:38:39.444457Z I i.q.c.O3PartitionPurgeJob processed [table=packets]
2022-12-12T10:38:39.598159Z I i.q.c.O3PartitionPurgeJob processing [table=packets]
2022-12-12T10:38:39.598619Z I i.q.c.O3PartitionPurgeJob purging dropped partition directory [path=/root/.questdb/db/packets/2022-12-12T09]
2022-12-12T10:38:39.599114Z I i.q.c.O3PartitionPurgeJob purged [path=/root/.questdb/db/packets/2022-12-12T09]
2022-12-12T10:38:39.599180Z I i.q.c.O3PartitionPurgeJob processed [table=packets]
2022-12-12T10:38:39.836999Z I pg-server scheduling disconnect [fd=220, reason=15]
2022-12-12T10:38:39.837034Z I pg-server disconnected [ip=172.21.0.35, fd=220, src=queue]
2022-12-12T10:38:44.401972Z I pg-server connected [ip=172.21.0.35, fd=220]
2022-12-12T10:38:44.402717Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2022-12-12T10:38:44.402771Z I i.q.c.p.PGConnectionContext property [name=user, value=admin]
2022-12-12T10:38:44.402781Z I i.q.c.p.PGConnectionContext property [name=database, value=qdb]
2022-12-12T10:38:44.402788Z I i.q.c.p.PGConnectionContext property [name=client_encoding, value=UTF8]
2022-12-12T10:38:44.402795Z I i.q.c.p.PGConnectionContext property [name=DateStyle, value=ISO]
2022-12-12T10:38:44.402802Z I i.q.c.p.PGConnectionContext property [name=TimeZone, value=Europe/Warsaw]
2022-12-12T10:38:44.402807Z I i.q.c.p.PGConnectionContext property [name=extra_float_digits, value=2]
2022-12-12T10:38:44.403540Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SET extra_float_digits = 3]
2022-12-12T10:38:44.404142Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SET application_name = 'PostgreSQL JDBC Driver']
2022-12-12T10:38:44.406831Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SELECT id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') as timestamp, observationTimestamp FROM 'packets' WHERE timestamp > to_timezone(1670841518994052,'UTC')]
2022-12-12T10:38:44.410359Z I i.q.g.SqlCompiler plan [q=`select-virtual id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') timestamp, observationTimestamp from (select [id, src, dst, srcPort, dstPort, probeId, protocol, timestamp, observationTimestamp] from packets timestamp (timestamp) where timestamp > to_timezone(1670841518994052,'UTC'))`, fd=220]
2022-12-12T10:38:44.411711Z I pg-server scheduling disconnect [fd=220, reason=15]
2022-12-12T10:38:44.411731Z I pg-server disconnected [ip=172.21.0.35, fd=220, src=queue]
2022-12-12T10:38:49.402548Z I pg-server connected [ip=172.21.0.35, fd=220]
2022-12-12T10:38:49.402814Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2022-12-12T10:38:49.402830Z I i.q.c.p.PGConnectionContext property [name=user, value=admin]
2022-12-12T10:38:49.402837Z I i.q.c.p.PGConnectionContext property [name=database, value=qdb]
2022-12-12T10:38:49.402842Z I i.q.c.p.PGConnectionContext property [name=client_encoding, value=UTF8]
2022-12-12T10:38:49.402847Z I i.q.c.p.PGConnectionContext property [name=DateStyle, value=ISO]
2022-12-12T10:38:49.402851Z I i.q.c.p.PGConnectionContext property [name=TimeZone, value=Europe/Warsaw]
2022-12-12T10:38:49.402858Z I i.q.c.p.PGConnectionContext property [name=extra_float_digits, value=2]
2022-12-12T10:38:49.403386Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SET extra_float_digits = 3]
2022-12-12T10:38:49.403886Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SET application_name = 'PostgreSQL JDBC Driver']
2022-12-12T10:38:49.405125Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SELECT id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') as timestamp, observationTimestamp FROM 'packets' WHERE timestamp > to_timezone(1670841518994052,'UTC')]
2022-12-12T10:38:49.406765Z I i.q.g.SqlCompiler plan [q=`select-virtual id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') timestamp, observationTimestamp from (select [id, src, dst, srcPort, dstPort, probeId, protocol, timestamp, observationTimestamp] from packets timestamp (timestamp) where timestamp > to_timezone(1670841518994052,'UTC'))`, fd=220]
2022-12-12T10:38:49.407962Z I pg-server scheduling disconnect [fd=220, reason=15]
2022-12-12T10:38:49.407985Z I pg-server disconnected [ip=172.21.0.35, fd=220, src=queue]
2022-12-12T10:38:54.400515Z I pg-server connected [ip=172.21.0.35, fd=220]
2022-12-12T10:38:54.400857Z I i.q.c.p.PGConnectionContext protocol [major=3, minor=0]
2022-12-12T10:38:54.400877Z I i.q.c.p.PGConnectionContext property [name=user, value=admin]
2022-12-12T10:38:54.400883Z I i.q.c.p.PGConnectionContext property [name=database, value=qdb]
2022-12-12T10:38:54.400889Z I i.q.c.p.PGConnectionContext property [name=client_encoding, value=UTF8]
2022-12-12T10:38:54.400893Z I i.q.c.p.PGConnectionContext property [name=DateStyle, value=ISO]
2022-12-12T10:38:54.400898Z I i.q.c.p.PGConnectionContext property [name=TimeZone, value=Europe/Warsaw]
2022-12-12T10:38:54.400903Z I i.q.c.p.PGConnectionContext property [name=extra_float_digits, value=2]
2022-12-12T10:38:54.401263Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SET extra_float_digits = 3]
2022-12-12T10:38:54.401484Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SET application_name = 'PostgreSQL JDBC Driver']
2022-12-12T10:38:54.402408Z I i.q.c.p.PGConnectionContext parse [fd=220, q=SELECT id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') as timestamp, observationTimestamp FROM 'packets' WHERE timestamp > to_timezone(1670841518994052,'UTC')]
2022-12-12T10:38:54.403551Z I i.q.g.SqlCompiler plan [q=`select-virtual id, src, dst, srcPort, dstPort, probeId, protocol, to_timezone(timestamp,'Europe/Warsaw') timestamp, observationTimestamp from (select [id, src, dst, srcPort, dstPort, probeId, protocol, timestamp, observationTimestamp] from packets timestamp (timestamp) where timestamp > to_timezone(1670841518994052,'UTC'))`, fd=220]
a

Alex Pelagenko

12/12/2022, 2:15 PM
Can you attach log file from 11utc today please?
p

Paweł Wangryn (Vangreen)

12/12/2022, 2:20 PM
I erase database and logs from 11 utc but same thing happend in my second test at 13 utc
a

Alex Pelagenko

12/12/2022, 2:30 PM
😞 it’s a bit late in the log, doesn’t say what it created
2022-12-12T12.21877
folder. Do you have the log file from earlier?
p

Paweł Wangryn (Vangreen)

12/12/2022, 2:35 PM
a

Alex Pelagenko

12/12/2022, 3:42 PM
Something weird is going on, but I cannot reproduce it
can you send
_txn
file from the table folder, should be
/root/.questdb/db/packets/_txn
Can be that the file contains something because of unrelated error and now drop partition is just a symptom
p

Paweł Wangryn (Vangreen)

12/13/2022, 10:05 AM
@Alex Pelagenko In this file I attached three _txn files • just after partition drop (when records starts to save with wrong time) • just after first error • after many errors
m

Miguel Arregui

12/13/2022, 11:24 AM
hi @Paweł Wangryn (Vangreen) would you also send the_meta file please, ta!!
p

Paweł Wangryn (Vangreen)

12/13/2022, 11:27 AM
a

Alex Pelagenko

12/13/2022, 1:57 PM
good news, I found the bug
bad news, I cannot suggest a simple workaround, so you’ll need to wait for a fix
The data is written correctly, it’s the reading part reload doesn’t handle the drop in some casese
The workaround would be to calm down reading so that every existing reader is evicted from the pool after timeout, then reading will be back to normal before the fix released
p

Paweł Wangryn (Vangreen)

12/13/2022, 2:02 PM
I know it's hard question but I need to ask as we plan to implement questDB in production in early Q1 2023. How much time it will take to fix this bug?
a

Alex Pelagenko

12/13/2022, 2:02 PM
I have a fix, not a big deal
I can merge it to master this week and you can try nightly docker
p

Paweł Wangryn (Vangreen)

12/13/2022, 2:03 PM
Great!
a

Alex Pelagenko

12/13/2022, 2:03 PM
release may be this month too but it can slip to beginning of Jan
p

Paweł Wangryn (Vangreen)

12/13/2022, 2:06 PM
Ok, thanks! So I'm looking forward to merge this fix.
@Alex Pelagenko I try todays nightly version with your fix. Everything works perfect. Thanks for quick reaction!
a

Alex Pelagenko

12/15/2022, 11:11 AM
you already done it, impressive
thanks
m

Manoj DeSilva

02/24/2023, 2:24 PM
@Paweł Wangryn (Vangreen), @Alex Pelagenko, Can I please know the QuestDB version that contains this fix. Also bug ticket ID if something like that exists? Thanks.
p

Paweł Wangryn (Vangreen)

02/24/2023, 2:26 PM
@Manoj DeSilva since version 6.7 bug was fixed.
m

Manoj DeSilva

02/24/2023, 2:27 PM
Thanks @Paweł Wangryn (Vangreen)!
Hi @Alex Pelagenko, Is there any bug Id or ticket ID for this fix?
a

Alex Pelagenko

02/24/2023, 2:31 PM
I can find PR for you, yes. Do you want to see the code change?
m

Manoj DeSilva

02/24/2023, 2:32 PM
Actually, having the bug Id is sufficient. Just for reference.
a

Alex Pelagenko

02/24/2023, 2:32 PM
there is no bug id
m

Manoj DeSilva

02/24/2023, 2:33 PM
a

Alex Pelagenko

02/24/2023, 2:33 PM
that’s it, yes
m

Manoj DeSilva

02/24/2023, 2:33 PM
great! Thanks