https://questdb.io logo
Title
m

Miloš Kozák

12/14/2022, 2:41 PM
Hello everyone, I am just writing an collector of data which are persisted in QuestDB. The average amount of data that are inserted as row is about 2kB. My QuestDB instance is running as a standalone installation with fairly default configuration on a dedicated Azure instance. I am having incoming traffic about 100msg/s which all should be inserted and every 5minutes I am getting such error:
io.questdb.cutlass.line.LineSenderException: [104] send error
        at io.questdb.cutlass.line.tcp.PlainTcpLineChannel.send(PlainTcpLineChannel.java:100)
        at io.questdb.cutlass.line.AbstractLineSender.sendAll(AbstractLineSender.java:380)
        at io.questdb.cutlass.line.LineTcpSender.send00(LineTcpSender.java:86)
        at io.questdb.cutlass.line.AbstractLineSender.put(AbstractLineSender.java:199)
        at io.questdb.cutlass.line.AbstractLineSender.putUtf8Special(AbstractLineSender.java:319)
        at io.questdb.std.str.CharSink.encodeUtf8(CharSink.java:45)
        at io.questdb.std.str.CharSink.encodeUtf8(CharSink.java:37)
        at io.questdb.cutlass.line.AbstractLineSender.field(AbstractLineSender.java:133)
        at xxxx.core.timeseries.DbWriter.write(DbWriter.kt:33)
        at xxxx.core.timeseries.handlers.AddDataPointRequestHandler.handleRequestTyped(AddDataPointRequestHandler.kt:19)
        at xxxx.core.timeseries.handlers.AddDataPointRequestHandler.handleRequestTyped(AddDataPointRequestHandler.kt:8)
        at xxxx.core.timeseries.handlers.TypedTimeSeriesRequestHandler.handleRequest(TimeSeriesRequestHandler.kt:28)
        at xxxx.core.timeseries.TimeSeriesService.handleRequest(TimeSeriesService.kt:131)
        at xxxx.core.timeseries.TimeSeriesService.access$handleRequest(TimeSeriesService.kt:43)
        at xxxx.core.timeseries.TimeSeriesService$serviceLoop$1$2$1$5.emit(TimeSeriesService.kt:107)
        at xxxx.core.timeseries.TimeSeriesService$serviceLoop$1$2$1$5.emit(TimeSeriesService.kt:104)
        at kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt(Channels.kt:62)
        at kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt(Channels.kt:1)
        at kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend(Channels.kt)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
        at xxxx.common.Service.runBlocking(Service.kt:109)
        at xxxx.core.timeseries.TimeSeriesService.serviceLoop(TimeSeriesService.kt:71)
        at xxxx.core.timeseries.TimeSeriesServiceMain.main(TimeSeriesServiceMain.kt:14)
Interestingly, there are not errors in log file of QuestDB.
j

Jaromir Hamala

12/14/2022, 4:21 PM
hi Milos, the “every 5 minutes” aspect is suspicious. I’d check a firewall/NAT if there is any.
m

Miloš Kozák

12/15/2022, 8:26 AM
Thanks for reply. I have just checked, and there is no firewall in our setup.
j

Jaromir Hamala

12/15/2022, 11:12 AM
is there anything in questdb server log? edit: I just saw your last sentence in the first message
well, that’s interesting. as questdb - the server - should log disconnected clients. even when they disconnect gracefully.
m

Miloš Kozák

12/15/2022, 12:12 PM
We have just setup another QuestDB instance in the same subnet as the "problematic one" and forwarded the traffic there. Will keep you posted if the problem occurs even here.
j

Jaromir Hamala

12/16/2022, 8:50 AM
Hi Milos, what’s the outcome of the experiment?
m

Miloš Kozák

12/16/2022, 8:52 AM
Well, the same setup, but just a fresh azure instance and no occurrences of that exception. Not even one in the course of 24hours.
Just thinking that could be related to the VM or state of data in that VM. Planning to replicate the "old" database to the new server with rsync and carry on with the test to verify we have integrity error. Any other suggestion?
j

Jaromir Hamala

12/16/2022, 12:16 PM
not really. if you can reliably reproduce it every 5 mins then maybe I would try wireshark/tcpdump. running on both server and client boxes. this could help to identify why is the connection broken. it could help answering questions such as: what party closed the connection first? is there RST going on? from what end?
m

Miloš Kozák

12/16/2022, 12:22 PM
We have to be connected, but right now I got a few errors. Having empty container in k8s and VM only lead to OK situation. We have just swapped drive and inserted today's morning snapshot, and here we go.
Looking into logs of questdb, I noticed something strange. The 3. line [104] where you san see series=pc at the end indicating I want to write to table named pc corresponds to log line:
Dec 16 12:10:37 questdb-ana-test-0 questdb[824061]: 2022-12-16T12:10:37.058467Z E i.q.c.l.t.LineTcpConnectionContext [103] could not process line data [table=anotherthable_name, msg=cast error for line protocol 
Dec 16 12:10:37 questdb-ana-test-0 questdb[824061]: 2022-12-16T12:10:37.058500Z E i.q.c.l.t.LineTcpConnectionContext [103] could not parse measurement, NONE at 987, line (may be mangled due to partial parsing
The second line is very long, but when I scroll right I can see the line is open ended. Looks like some buffer has overflowed.
j

Jaromir Hamala

12/16/2022, 12:28 PM
cast error for line protocol
indicates a type problem
I am not sure if questdb always logs full lines, even if they are super long. maybe @Imre knows by heart?
i

Imre

12/16/2022, 12:53 PM
hi, yes, it looks like a type cast error. is this the log directly copied into the chat from the log file?
seems like the end of the error message is missing
could not process line data [table=anotherthable_name, msg=cast error for line protocol
that should continue with more information about the error, like which column it is and the types which we are trying to cast from and to
same for the second error message, the input line should be printed there
line (may be mangled due to partial parsing): 'this is the input here...'
we print the entire input, no matter how long it is
second line is a result of the cast error
m

Miloš Kozák

12/16/2022, 12:58 PM
I have just DM you that message, but it looks trimmed to me
i

Imre

12/16/2022, 12:59 PM
no, that is the full message. it shows the designated timestamp at the end
let me check
quick update, we sorted this on private chat. it was a symbol vs string column type mismatch