Ajay Pilaniya
12/26/2022, 7:28 PMUPDATE isValid = false where JobId = #{currentJobID}
so basically we mark all rows with current job Id false and re ingest the data. so this way even we have duplicate rows but older rows are marked false and will not be considered in our further pipeline. The problem is that when one table reached 2 million rows I am getting timeout error on this UPDATE query. (PSQLException: ERROR: Timeout expired on waiting for the async command execution result
) I am using PGWire.
I am not sure how can I handle this error or is there a better approach to handle this scenario? Any help is appreciated..Nicolas Hourcard
12/26/2022, 9:15 PMAndrey Pechkurov
12/26/2022, 9:16 PMJobId
column?UPDATE my_table SET isValid = false where JobId = #{currentJobID} and ts in `2022-12-26`;
Alex Pelagenko
12/26/2022, 9:30 PMImre
12/26/2022, 10:48 PMTimeout expired on waiting for the async command execution result
suggests that the UPDATE
statement could not run immediately and it was waiting in the async execution queue.
Are the jobs ingesting data via ILP and you are running the UPDATE via PGWire?
In that case ILP could hold the table and prevent the UPDATE
statement from running.
It would be put on a queue and executed asynchronously when ILP releases the table.
However, if ILP does not release the table for a set timeout, the UPDATE
will be dropped and the above error will be seen.UPDATE
Ajay Pilaniya
12/27/2022, 4:07 AMorg.postgresql.util.PSQLException: ERROR: Timeout expired on waiting for the async command execution result [instance=12]Timeout expired on waiting for the async command execution result [instance=13]Timeout expired on waiting for the async command execution result [instance=33]Timeout expired on waiting for the async command execution result [instance=36]Timeout expired on waiting for the async command execution result [instance=39]Timeout expired on waiting for the async command execution result [instance=41]Timeout expired on waiting for the async command execution result [instance=48]Timeout expired on waiting for the async command execution result [instance=62]Timeout expired on waiting for the async command execution result [instance=105]Timeout expired on waiting for the async command execution result [instance=125]Timeout expired on waiting for the async command execution result [instance=130]Timeout expired on waiting for the async command execution result [instance=131]Timeout expired on waiting for the async command execution result [instance=136]Timeout expired on waiting for the async command execution result [instance=139]Timeout expired on waiting for the async command execution result [instance=140]Timeout expired on waiting for the async command execution result [instance=151]
Andrey Pechkurov
12/27/2022, 7:29 AMobId column is of type string currently? Should I update it to SYMBOL?Yes, it might be a good idea in any case. Symbol is more efficient than repeated string values.
Ajay Pilaniya
12/27/2022, 7:53 AMAndrey Pechkurov
12/27/2022, 8:05 AMAjay Pilaniya
12/27/2022, 8:28 AMAndrey Pechkurov
12/27/2022, 8:34 AMLONG
columns, that means up to 13.5M rows per partition.Ajay Pilaniya
12/27/2022, 9:47 AM2022-12-27T09:44:06.106944Z E i.q.g.UpdateOperator could not update
io.questdb.cairo.CairoException: [-1] timeout, query aborted [fd=211]
at io.questdb.cairo.CairoException.critical(CairoException.java:89)
at io.questdb.cairo.CairoException.nonCritical(CairoException.java:83)
at io.questdb.griffin.engine.ops.UpdateOperation.forceTestTimeout(UpdateOperation.java:105)
at io.questdb.griffin.UpdateOperator.executeUpdate(UpdateOperator.java:119)
at io.questdb.griffin.engine.ops.UpdateOperation.apply(UpdateOperation.java:65)
at io.questdb.cairo.TableWriter.processAsyncWriterCommand(TableWriter.java:4460)
at io.questdb.cairo.TableWriter.processCommandQueue(TableWriter.java:1215)
at io.questdb.cairo.TableWriter.processCommandQueue(TableWriter.java:4499)
at io.questdb.cairo.TableWriter.tick(TableWriter.java:1833)
at io.questdb.cairo.pool.WriterPool.returnToPool(WriterPool.java:555)
at io.questdb.cairo.pool.WriterPool$Entry.close(WriterPool.java:608)
at io.questdb.cairo.TableWriter.close(TableWriter.java:756)
at io.questdb.std.Misc.free(Misc.java:45)
at io.questdb.cutlass.pgwire.PGConnectionContext.executeTag0(PGConnectionContext.java:1320)
at io.questdb.cutlass.pgwire.PGConnectionContext.executeTag(PGConnectionContext.java:1309)
at io.questdb.cutlass.pgwire.PGConnectionContext.processExecute(PGConnectionContext.java:2147)
at io.questdb.cutlass.pgwire.PGConnectionContext.processExec(PGConnectionContext.java:2131)
at io.questdb.cutlass.pgwire.PGConnectionContext.parse(PGConnectionContext.java:1568)
at io.questdb.cutlass.pgwire.PGConnectionContext.handleClientOperation(PGConnectionContext.java:413)
at io.questdb.cutlass.pgwire.PGJobContext.handleClientOperation(PGJobContext.java:84)
at io.questdb.cutlass.pgwire.PGWireServer$1.lambda$$0(PGWireServer.java:83)
at io.questdb.network.AbstractIODispatcher.processIOQueue(AbstractIODispatcher.java:164)
at io.questdb.cutlass.pgwire.PGWireServer$1.run(PGWireServer.java:110)
at io.questdb.mp.Worker.run(Worker.java:118)
Andrey Pechkurov
12/27/2022, 9:50 AMAjay Pilaniya
12/27/2022, 9:50 AM2022-12-27T09:46:23.453311Z I i.q.c.TableWriter published async command complete event [type=3,tableName=cdn_logs,tableId=4,correlationId=25,errorCode=-3,errorMsg=timeout, query aborted [fd=652]]
2022-12-27T09:46:23.453319Z I i.q.c.TableWriter received async cmd [type=3, tableName=cdn_logs, tableId=4, correlationId=26, cursor=25]
2022-12-27T09:46:12.805132Z I i.q.g.e.o.OperationFutureImpl published ASYNC writer command [name=UPDATE,tableName=cdn_logs,instance=27]
2022-12-27T09:46:13.306167Z E i.q.c.p.PGConnectionContext error [pos=0, msg=`Timeout expired on waiting for the async command execution result [instance=6]Timeout expired on waiting for the async command execution result [instance=7]Timeout expired on waiting for the async command execution result [instance=8]Timeout expired on waiting for the async command execution result [instance=9]Timeout expired on waiting for the async command execution result [instance=13]Timeout expired on waiting for the async command execution result [instance=16]Timeout expired on waiting for the async command execution result [instance=20]Timeout expired on waiting for the async command execution result [instance=21]Timeout expired on waiting for the async command execution result [instance=26]Timeout expired on waiting for the async command execution result [instance=27]`]
Andrey Pechkurov
12/27/2022, 10:14 AMcairo.writer.alter.busy.wait.timeout=5000
(default value is 500ms). This might help to get rid of the timeout errors in situation when another connection is holding the table writer.Ajay Pilaniya
12/27/2022, 10:16 AMAndrey Pechkurov
12/27/2022, 10:18 AMAjay Pilaniya
12/27/2022, 10:19 AMcairo.writer.alter.busy.wait.timeout
property by the way?Andrey Pechkurov
12/27/2022, 10:21 AM<qdb_root>/conf/server.conf
cdn_logs_ENTERR10
table:
2022-12-27T10:26:17.406887Z I i.q.c.p.PGConnectionContext parse [fd=117, q=UPDATE 'cdn_logs_ENTERR10' SET isValid = false where ts >= to_timestamp($1, 'dd-MM-yyyy-HH:mm:ss')
AND ts < to_timestamp($2, 'dd-MM-yyyy-HH:mm:ss') AND JobId = $3]
2022-12-27T10:26:17.410195Z I i.q.c.p.WriterPool busy [table=`cdn_logs_ENTERR10`, owner=16, thread=20, reason=insert]
Notice busy + reason=insertAjay Pilaniya
12/27/2022, 11:02 AMAndrey Pechkurov
12/27/2022, 11:03 AMAjay Pilaniya
12/27/2022, 11:05 AMNicolas Hourcard
12/27/2022, 11:06 AMAjay Pilaniya
12/27/2022, 11:07 AMAndrey Pechkurov
12/27/2022, 11:07 AMIs there any query using which I can find out if table is busy or not?I don't think we have such function. You could try handling the UPDATE timed out error and re-schedule your job or do something like exponential timeout.
Nicolas Hourcard
12/27/2022, 11:17 AMAjay Pilaniya
12/27/2022, 11:22 AMAndrey Pechkurov
12/27/2022, 11:23 AMNicolas Hourcard
12/27/2022, 11:34 AMAjay Pilaniya
12/27/2022, 1:07 PMAndrey Pechkurov
12/27/2022, 1:09 PMNo issues in last one hour. 🎉Awesome to hear that!
Ajay Pilaniya
12/27/2022, 1:13 PMAndrey Pechkurov
12/27/2022, 1:17 PMAjay Pilaniya
12/27/2022, 1:20 PMAndrey Pechkurov
12/27/2022, 1:24 PMSender
instanceif I have 100k rows to insert then also I need to insert these one by one? Wouldn’t that slow down insertions?The client writes rows to the TCP socket in batches, so the performance is good.
Ajay Pilaniya
12/27/2022, 1:27 PMAndrey Pechkurov
12/27/2022, 1:28 PMNicolas Hourcard
12/27/2022, 1:29 PMAndrey Pechkurov
12/27/2022, 1:30 PMAjay Pilaniya
12/27/2022, 1:32 PMAndrey Pechkurov
12/27/2022, 1:36 PMAjay Pilaniya
12/27/2022, 1:38 PMAndrey Pechkurov
12/27/2022, 1:48 PMj.u.c.l.ReadWriteLock
. Your ILP threads would be acquiring it as readers while the thread that does UPDATE would act as a writer periodically trying to acquire the lock and then execute the UPDATE, if necessary.Ajay Pilaniya
12/27/2022, 1:51 PM```try (Sender sender = Sender.builder().address("10.41.12.239:8812").build()) {
sender.table("cdn_logs_QA_Manual")
.symbol("ContentId", "ABC")
.symbol("RequestId", "ZZZ")
.symbol("DistributionType", "VOD")
.symbol("FileType", "X")
.symbol("Method", "POST")
.symbol("CdnType", "CF")
.symbol("DistributionId", "ABC")
.boolColumn("isValid", false).stringColumn("ts", DateUtils.getQuestDbFormat(System.currentTimeMillis())).atNow();
sender.flush();
}
But this is not inserting any data. Not getting any errors also. Just getting below log -
2022-12-27T14:30:21.743003Z I i.q.c.l.t.PlainTcpLineChannel Send buffer size change from 131614 to 131072
Am I missing something?Andrey Pechkurov
12/27/2022, 3:02 PMts
a designated column? If so, the snippet should be:
try (Sender sender = Sender.builder().address("10.41.X.X:8812").build()) {
sender.table("cdn_logs_QA_Manual")
.symbol("ContentId", "ABC")
.symbol("RequestId", "ZZZ")
.symbol("DistributionType", "VOD")
.symbol("FileType", "X")
.symbol("Method", "POST")
.symbol("CdnType", "CF")
.symbol("DistributionId", "ABC")
// .timestampColumn("ts", System.currentTimeMillis()) - this value is set via atNow() call
.boolColumn("isValid", false).atNow();
sender.flush(); // additional note this call is optional; sender flushes the data on close
}
Ajay Pilaniya
12/28/2022, 3:56 AMAndrey Pechkurov
12/28/2022, 7:22 AMAlso I have a password setup on my dbDo you mean ILP password? If so, you should use
enableAuth()
and authToken()
calls:
https://questdb.io/docs/reference/clients/java_ilp/#example-with-tls-and-authentication-enabledAjay Pilaniya
12/28/2022, 8:39 AMAs for no data inserted into the table, server logs should contain the reason for it
Checked server logs and nothing related to this found. So I guess this data is not being sent to server. But without any error logs how do I find exact issue?Do you mean ILP password?
In my java app I am using default password of quest. Not sure how to use that in ILP @Andrey PechkurovAndrey Pechkurov
12/28/2022, 8:59 AMDo I use same port during ILP ingestion?No, ILP port is 9009. Probably, that's the reason. Try changing the port.
Ajay Pilaniya
12/28/2022, 9:02 AMcould not process line data [table=content_details_test, msg=cast error for line protocol tag [columnWriterIndex=3, columnType=STRING], errno=0]
2022-12-28T09:00:53.827517Z E i.q.c.l.t.LineTcpConnectionContext [175] could not parse measurement, NONE at 59, line (may be mangled due to partial parsing): 'content_details_test,ContentId=ABC,PartnerId=ZZZ,Title=Test'
And one potential issue I see on this is if my java app cannot catch such errors and these are only logged on server side then how will I know if some insertions failed?Andrey Pechkurov
12/28/2022, 9:20 AMAjay Pilaniya
12/28/2022, 9:45 AM1970-01-01T00:27:52.231208Z
This seems random. (I think it is taking server time and not time provided in input)
What could be possible reason here?Andrey Pechkurov
12/28/2022, 12:54 PMAjay Pilaniya
12/28/2022, 1:56 PMAndrey Pechkurov
12/28/2022, 2:03 PMAjay Pilaniya
12/30/2022, 6:47 AMAndrey Pechkurov
12/30/2022, 7:32 AMAjay Pilaniya
12/30/2022, 7:35 AMAndrey Pechkurov
12/30/2022, 7:53 AMAjay Pilaniya
12/30/2022, 7:55 AMAlso, what are your EBS volume settings (IOPS and throughput)? Default values or something above?I am using IOPS 3000 and Throughput 125
Andrey Pechkurov
12/30/2022, 7:59 AMI can increase batch sizes to 30kYou should increase the batch size to 30k then
I am using IOPS 3000 and Throughput 125That's your main bottleneck. Please try increasing the settings to, say, 8000 IOPS and 500MB/s throughput. You can increase them even more, if the increase volume cost is ok to you. The higher settings are, the faster writes will be
Ajay Pilaniya
12/30/2022, 8:00 AMAndrey Pechkurov
12/30/2022, 9:30 AMSo if my ingestion job has 500k rows to insert, all these rows will be from a single hour. Should I change my partition to HOUR so that during O3 writes, ingestion speed can be improved?Yes, this way partitions to be re-written will get much smaller.
Ajay Pilaniya
12/30/2022, 11:03 AMAndrey Pechkurov
12/30/2022, 11:07 AMAjay Pilaniya
12/30/2022, 11:08 AM2022-12-30T16:05:08.953802Z E i.q.c.l.t.LineTcpConnectionContext [37227] could not parse measurement, NONE at 1282, line (may be mangled due to partial parsing): 'cdn_logs,ContentId=6346ba504da63b20baf6ecb0,RequestId=F77E08D57C5646C798ED3A53FC33C4AB,JobId=E1KUOC0ENJS6P5,DistributionType=VOD,FileType=DASH_VIDEO_SEGMENT,Method=GET,CdnType=CloudFront,DistributionId=JJJJC0ENJS6P5,StreamingPartnerId=ABC-ABC,UserCity=London,UserCountry=United Kingdomm,UserDevice=Mobile,UserBrowser=-,UserDeviceBrand=Dangalplay,UserDeviceName=Dangalplay Android Mobilele,UserId=iWGYSC6ZAOR414EBX20,Agent=Special,AgentName=<http://Com.dangalplay.tv|Com.dangalplay.tv>,UserOs=Android,OsClass=Mobile CdnRequestId="-IxVGYSFfWfi_bnXatuj0CLrQ277PqG5ShOSPfLkwFyt3GPmcVwIsw==",UserAgent="<http://com.dangalplay.tv/1.0.10%20(Linux;Android%2010)%20ExoPlayerLib/2.16.1|com.dangalplay.tv/1.0.10%20(Linux;Android%2010)%20ExoPlayerLib/2.16.1>",Referer="",Domain="<http://streams-msp.test.in|streams-msp.test.in>",Ip="1.2.3.4",Extras="",FileName="chunk_ctvideo_ridp0va0br2500000_cs50400000_w1550299147_mpd.m4s",CacheStat="Miss",Resolution="854x480",EdgeLocation="BOM78-P4",FilePath="/ENTERR10/EPISODE/6346ba504da63b20baf6ecb0/dash/854x480/chunk_ctvideo_ridp0va0br2500000_cs50400000_w1550299147_mpd.m4s",AgentVersion="1.0.10",AgentVersionName="<http://Com.dangalplay.tv|Com.dangalplay.tv> 1.0.10",OsVersionName="Android 10",OsVersion="10",isValid=t,SegmentCount=1i,ViewCount=0i,FileSize=263306i,ResponseTime=0.106,ua=1672436108953000t,ca=1672436108953000t 1670508923000000000'^M
2022-12-30T16:05:08.953813Z I tcp-line-server scheduling disconnect [fd=37227, reason=0]^M
2022-12-30T16:05:08.953815Z I tcp-line-server disconnected [ip=10.60.13.143, fd=37227, src=queue]^M
2022-12-30T16:05:08.953969Z I tcp-line-server connected [ip=10.60.13.143, fd=37227]^M
2022-12-30T16:05:08.954072Z I i.q.c.p.PGConnectionContext query cache used [fd=23691]^M
2022-12-30T16:05:08.954122Z I tcp-line-server connected [ip=10.60.13.143, fd=37601]^M
2022-12-30T16:05:08.954343Z I i.q.c.p.PGConnectionContext query cache used [fd=23320]^M
2022-12-30T16:05:08.954628Z I tcp-line-server scheduling disconnect [fd=24098, reason=0]^M
2022-12-30T16:05:08.954630Z I tcp-line-server disconnected [ip=10.60.13.143, fd=24098, src=queue]^M
2022-12-30T16:05:08.954675Z I tcp-line-server connected [ip=10.60.13.143, fd=24098]^M
2022-12-30T16:05:08.954697Z I i.q.c.p.PGConnectionContext query cache used [fd=63343]^M
2022-12-30T16:05:08.955128Z I tcp-line-server connected [ip=10.60.13.143, fd=39064]^M
[thread 10751 also had an error][thread 10750 also had an error]
Compiled method (c1) 209430797 6875 3 io.questdb.cairo.ConcurrentBitmapIndexFwdReader$Cursor::of (368 bytes)
total in heap [0x00007f34122a6810,0x00007f34122a9660] = 11856
relocation [0x00007f34122a6988,0x00007f34122a6c38] = 688
main code [0x00007f34122a6c40,0x00007f34122a87c0] = 7040
stub code [0x00007f34122a87c0,0x00007f34122a8970] = 432
oops [0x00007f34122a8970,0x00007f34122a8990] = 32
metadata [0x00007f34122a8990,0x00007f34122a8a08] = 120
scopes data [0x00007f34122a8a08,0x00007f34122a9030] = 1576
scopes pcs [0x00007f34122a9030,0x00007f34122a9550] = 1312
dependencies [0x00007f34122a9550,0x00007f34122a9560] = 16
nul chk table [0x00007f34122a9560,0x00007f34122a9660] = 256
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
# <https://github.com/AdoptOpenJDK/openjdk-support/issues>
Andrey Pechkurov
12/30/2022, 5:42 PMAjay Pilaniya
12/30/2022, 5:46 PMCREATE TABLE IF NOT EXISTS cdn_logs_ (ts TIMESTAMP, ua TIMESTAMP, ca TIMESTAMP,
CdnRequestId string not null,
ContentId SYMBOL CAPACITY 50000 CACHE, RequestId SYMBOL CAPACITY 50000 CACHE,
UserAgent string, JobId SYMBOL CAPACITY 20000 CACHE, Referer string,
Domain string, Ip string, Extras string, FileName string not null, CacheStat string, Resolution string, EdgeLocation string,
FilePath string not null, DistributionType SYMBOL CAPACITY 10 CACHE,
StatusCode int, FileType SYMBOL CAPACITY 10 CACHE, Method SYMBOL CAPACITY 10 CACHE,
CdnType SYMBOL CAPACITY 10 CACHE, DistributionId SYMBOL CAPACITY 100 NOCACHE, StreamingPartnerId SYMBOL CAPACITY 500 CACHE,
UserCity SYMBOL CAPACITY 5000 CACHE, UserCountry SYMBOL CAPACITY 300 CACHE, UserDevice SYMBOL CAPACITY 50 CACHE,
UserBrowser SYMBOL CAPACITY 500 CACHE,
UserDeviceBrand SYMBOL CAPACITY 500 CACHE, UserDeviceName SYMBOL CAPACITY 500 CACHE, UserId SYMBOL CAPACITY 500000 NOCACHE,
Agent SYMBOL CAPACITY 5000 NOCACHE, AgentName SYMBOL CAPACITY 500000 NOCACHE, AgentVersion string, AgentVersionName string,
OsVersionName string, UserOs SYMBOL CAPACITY 30 CACHE, isValid boolean, OsClass SYMBOL CAPACITY 30 CACHE, OsVersion string,
ResponseTime double, SegmentCount long, ViewCount long, FileSize long) timestamp(ts) PARTITION BY HOUR;
Could you also share the JVM crash dump file?At which path I can find this?
2022-12-30T16:05:08.953802Z E i.q.c.l.t.LineTcpConnectionContext [37227] could not parse measurement, NONE at 1282, line (may be mangled due to partial parsing): 'cdn_logs_ENTERR10,ContentId=6346ba504da63b20baf6ecb0,RequestId=F77E08D57C5646C798ED3A53FC33C4AB,JobId=E1KUOC0ENJS6P5.2022-12-08-14.6f1b4e6f.gz,DistributionType=VOD,FileType=DASH_VIDEO_SEGMENT,Method=GET,CdnType=CloudFront,DistributionId=E1KUOC0ENJS6P5,StreamingPartnerId=ENTERR10-ENTERR10,UserCity=London,UserCountry=United Kingdomm,UserDevice=Mobile,UserBrowser=-,UserDeviceBrand=Dangalplay,UserDeviceName=Dangalplay Android Mobilele,UserId=iWGYSC6ZAOR414EBX20,Agent=Special,AgentName=<http://Com.dangalplay.tv|Com.dangalplay.tv>,UserOs=Android,OsClass=Mobile CdnRequestId="-IxVGYSFfWfi_bnXatuj0CLrQ277PqG5ShOSPfLkwFyt3GPmcVwIsw==",UserAgent="<http://com.dangalplay.tv/1.0.10%20(Linux;Android%2010)%20ExoPlayerLib/2.16.1|com.dangalplay.tv/1.0.10%20(Linux;Android%2010)%20ExoPlayerLib/2.16.1>",Referer="",Domain="<http://enterr10-streams-msp.streamready.in|enterr10-streams-msp.streamready.in>",Ip="45.250.66.25",Extras="",FileName="chunk_ctvideo_ridp0va0br2500000_cs50400000_w1550299147_mpd.m4s",CacheStat="Miss",Resolution="854x480",EdgeLocation="BOM78-P4",FilePath="/ENTERR10/EPISODE/6346ba504da63b20baf6ecb0/dash/854x480/chunk_ctvideo_ridp0va0br2500000_cs50400000_w1550299147_mpd.m4s",AgentVersion="1.0.10",AgentVersionName="<http://Com.dangalplay.tv|Com.dangalplay.tv> 1.0.10",OsVersionName="Android 10",OsVersion="10",isValid=t,SegmentCount=1i,ViewCount=0i,FileSize=263306i,ResponseTime=0.106,ua=1672436108953000t,ca=1672436108953000t 1670508923000000000'^M
2022-12-30T16:05:08.953813Z I tcp-line-server scheduling disconnect [fd=37227, reason=0]^M
2022-12-30T16:05:08.953815Z I tcp-line-server disconnected [ip=10.60.13.143, fd=37227, src=queue]^M
2022-12-30T16:05:08.953969Z I tcp-line-server connected [ip=10.60.13.143, fd=37227]^M
2022-12-30T16:05:08.954072Z I i.q.c.p.PGConnectionContext query cache used [fd=23691]^M
2022-12-30T16:05:08.954122Z I tcp-line-server connected [ip=10.60.13.143, fd=37601]^M
2022-12-30T16:05:08.954343Z I i.q.c.p.PGConnectionContext query cache used [fd=23320]^M
2022-12-30T16:05:08.954628Z I tcp-line-server scheduling disconnect [fd=24098, reason=0]^M
2022-12-30T16:05:08.954630Z I tcp-line-server disconnected [ip=10.60.13.143, fd=24098, src=queue]^M
2022-12-30T16:05:08.954675Z I tcp-line-server connected [ip=10.60.13.143, fd=24098]^M
2022-12-30T16:05:08.954697Z I i.q.c.p.PGConnectionContext query cache used [fd=63343]^M
2022-12-30T16:05:08.955128Z I tcp-line-server connected [ip=10.60.13.143, fd=39064]^M
[thread 10751 also had an error][thread 10750 also had an error]
Compiled method (c1) 209430797 6875 3 io.questdb.cairo.ConcurrentBitmapIndexFwdReader$Cursor::of (368 bytes)
total in heap [0x00007f34122a6810,0x00007f34122a9660] = 11856
relocation [0x00007f34122a6988,0x00007f34122a6c38] = 688
main code [0x00007f34122a6c40,0x00007f34122a87c0] = 7040
stub code [0x00007f34122a87c0,0x00007f34122a8970] = 432
oops [0x00007f34122a8970,0x00007f34122a8990] = 32
metadata [0x00007f34122a8990,0x00007f34122a8a08] = 120
scopes data [0x00007f34122a8a08,0x00007f34122a9030] = 1576
scopes pcs [0x00007f34122a9030,0x00007f34122a9550] = 1312
dependencies [0x00007f34122a9550,0x00007f34122a9560] = 16
nul chk table [0x00007f34122a9560,0x00007f34122a9660] = 256
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
# <https://github.com/AdoptOpenJDK/openjdk-support/issues>
Andrey Pechkurov
12/30/2022, 5:56 PMCREATE TABLE IF NOT EXISTS cdn_logs_I guess table name should be
cdn_logs
, not cdn_logs_
Ajay Pilaniya
12/30/2022, 5:57 PMAndrey Pechkurov
12/30/2022, 6:17 PMCould not load hsdis-amd64.so
doesn't seem ok. Looks like JVM failed to write a crash dumpAjay Pilaniya
12/30/2022, 6:21 PMIs it the very first message your client tries writing on application start or there are some prior ILP messages?No.. It is part of a bigger batch. Since in first crash this batch failed then my code retriggered same batch. Before this row already 100k+ logs were inserted. The logs which I provided are the last few lines of given log file. Before that everything seems to be normal
Which version of JDK are you using?doesn’t seem ok. Looks like JVM failed to write a crash dumpCould not load hsdis-amd64.so
Which version of JDK are you using? Could not load hsdis-amd64.so doesn't seem ok. Looks like JVM failed to write a crash dump
Andrey Pechkurov
12/30/2022, 6:25 PMhs_err_pidXXXX.log
crash dump file. But in this case it failed to prepare the crash dumpAjay Pilaniya
12/30/2022, 6:30 PM#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 16384 bytes for committing reserved memory.
[thread 25374 also had an error]
Andrey Pechkurov
12/30/2022, 6:31 PMAjay Pilaniya
12/30/2022, 6:32 PMAndrey Pechkurov
12/30/2022, 6:33 PMAjay Pilaniya
12/30/2022, 6:34 PMWhy did you change it to 30GB?@Andrey Pechkurov after first crash I assumed that JVM is running out of memory so was experimenting around it
Andrey Pechkurov
12/30/2022, 6:34 PMAjay Pilaniya
12/30/2022, 6:38 PMAndrey Pechkurov
12/30/2022, 6:38 PMI have 65535 as open file limit. Do I need to change it? And 65530 for virtual memory area limitI'd increase both of these settings 4x times at least to avoid problems later, when your DB grows
Ajay Pilaniya
12/30/2022, 6:39 PMCould you try disabling symbol caches on all columns?How can I disable symbol cache in existing schema?
Andrey Pechkurov
12/30/2022, 6:41 PMContentId
or RequestId
or anything else), you should use STRING type for them instead of SYMBOL. SYMBOL type doesn't make sense if each row gets a unique value - it only adds disk space, CPU and RAM overhead in such situationAjay Pilaniya
12/30/2022, 6:52 PM2022-12-31T03:06:03.379553Z E i.q.c.p.PGConnectionContext error [pos=21, msg=`[24]: could not open read-only [file=/root/.questdb/db/cdn_logs_ENTERR10/ContentId.o]`]
Andrey Pechkurov
12/31/2022, 10:11 AM2022-12-31T03:06:03.379553Z E i.q.c.p.PGConnectionContext error [pos=21, msg=Error code 24 means "Too many open files.". Please check that][24]: could not open read-only [file=/root/.questdb/db/cdn_logs_ENTERR10/ContentId.o]
fs.file-max
applied to the database processAjay Pilaniya
12/31/2022, 10:13 AMAndrey Pechkurov
12/31/2022, 10:14 AMAjay Pilaniya
12/31/2022, 10:19 AMYes, the database can open additional column files on writes to older partitionsHow can I avoid this? Currently I am writing mostly out of order so I am accessing partitions in last two months.
Andrey Pechkurov
12/31/2022, 10:26 AMcat /proc/<pid>/limits
where <pid>
is the database process id?Max open files
is 65535
which is the default. Seems that your kernel setting isn't increasedroot
, changing /etc/sysctl.conf
may be not enough. Try adding the following lines to `/etc/security/limits.conf`:
root soft nofile 262144
root hard nofile 262144
Ajay Pilaniya
12/31/2022, 10:42 AMCould you checkScreenshot I attached is of database process itselfwherecat /proc/<pid>/limits
is the database process id?<pid>
Andrey Pechkurov
12/31/2022, 10:43 AMAjay Pilaniya
12/31/2022, 10:44 AM/etc/security/limits.conf
but still cat /proc/<pid>/limits
showing same number 65535Andrey Pechkurov
12/31/2022, 10:46 AMAjay Pilaniya
12/31/2022, 10:49 AMAndrey Pechkurov
12/31/2022, 10:51 AME
or C
stand for errorsAjay Pilaniya
12/31/2022, 10:52 AMINFO 2022-12-31 19:15:39,940 [analytics-job-thread-6] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 8903 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 21 seconds
INFO 2022-12-31 19:16:17,542 [analytics-job-thread-2] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 13159 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 58 seconds
INFO 2022-12-31 19:16:17,557 [analytics-job-thread-1] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 11216 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 55 seconds
INFO 2022-12-31 19:16:17,588 [analytics-job-thread-9] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 10190 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 59 seconds
INFO 2022-12-31 19:16:17,912 [analytics-job-thread-5] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 13614 in table : cdn_logs_ENTERR10, timeTaken : 2 minutes 37 seconds
INFO 2022-12-31 19:16:18,042 [analytics-job-thread-7] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 10729 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 56 seconds
INFO 2022-12-31 19:17:23,715 [analytics-job-thread-1] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 1228 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 5 seconds
INFO 2022-12-31 19:17:23,731 [analytics-job-thread-8] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 12250 in table : cdn_logs_ENTERR10, timeTaken : 3 minutes 1 second
INFO 2022-12-31 19:17:23,917 [analytics-job-thread-2] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 1657 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 6 seconds
INFO 2022-12-31 19:17:24,031 [analytics-job-thread-7] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 3119 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 5 seconds
INFO 2022-12-31 19:17:24,115 [analytics-job-thread-5] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 3509 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 5 seconds
INFO 2022-12-31 19:17:24,234 [analytics-job-thread-3] [] [AbstractQuestDao.java:84] ILP :: Inserted total records : 11325 in table : cdn_logs_ENTERR10, timeTaken : 3 minutes 2 seconds
Andrey Pechkurov
12/31/2022, 1:55 PMAjay Pilaniya
12/31/2022, 1:56 PMILP :: Inserted total records : 10190 in table : cdn_logs_ENTERR10, timeTaken : 1 minute 59 seconds (SO THIS WILL HAVE ROWS OF ANY PARTICULAR HOUR, 1PM to 2PM for example)
ILP :: Inserted total records : 10190 in table : cdn_logs_ENTERR10, timeTaken : 2 minute 9 seconds (Same as above, THIS WILL HAVE ROWS OF ANY PARTICULAR HOUR, 2PM to 3PM for example)
Andrey Pechkurov
12/31/2022, 1:58 PMAjay Pilaniya
12/31/2022, 2:03 PMDid you decrease xms and xmx settings down from 30GB?Not yet.
What’s the memory consumption of the database process?It is showing 98% which doesn’t seem right to me
It might be the case that either the network is slow or there is a bottleneck in your Java applicationCould be but earlier it was giving good performance and out of nowhere performance started dropping
JAVA_OPTS="
-D$QDB_PROCESS_LABEL
-ea -Dnoebug
-XX:ErrorFile=${QDB_ROOT}/db/hs_err_pid+%p.log
-XX:+UnlockExperimentalVMOptions
-XX:+AlwaysPreTouch
-XX:+UseParallelGC
-Xms28g
-Xmx28g
"
Andrey Pechkurov
12/31/2022, 2:08 PMAjay Pilaniya
12/31/2022, 2:15 PMAndrey Pechkurov
12/31/2022, 5:18 PMAjay Pilaniya
12/31/2022, 5:20 PMAs for 3TB, you may need to increase the file and mmap limits even higher.Can you suggest optimal settings for these?
Other than that, if you don’t actively query older partitions, they don’t consume memory and OS resources.I will be querying this data only one time to calculate aggregated analytics. After that I can drop this data.
Andrey Pechkurov
12/31/2022, 5:21 PMfs.nr_open=134217728
fs.file-max=134217728
vm.max_map_count=134217728
Ajay Pilaniya
12/31/2022, 5:22 PMAndrey Pechkurov
12/31/2022, 5:24 PMiotop
or iostat
and, if you observe high IOPS and throughput values, you should increase the volume settingsAjay Pilaniya
01/02/2023, 1:57 PM2023-01-02T13:54:24.486790Z I i.q.c.h.p.JsonQueryProcessorState [69] exec [q='SELECT ts, RequestContentId as ContentId, RequestUserId as UserId, count_distinct(RequestId) as TotalRequests FROM 'request_analytics_ENTERR10'']
2023-01-02T13:54:24.486805Z I i.q.c.h.p.QueryCache miss [thread=questdb-shared-7, sql=SELECT ts, RequestContentId as ContentId, RequestUserId as UserId, count_distinct(RequestId) as TotalRequests FROM 'request_analytics_ENTERR10']
2023-01-02T13:54:24.487023Z I i.q.g.SqlCompiler plan [q=`select-group-by ts, ContentId, UserId, count_distinct(RequestId) TotalRequests from (select-choose [ts, RequestContentId ContentId, RequestUserId UserId, RequestId] ts, RequestContentId ContentId, RequestUserId UserId, RequestId from (select [ts, RequestContentId, RequestUserId, RequestId] from request_analytics_ENTERR10 timestamp (ts)))`, fd=69]
2023-01-02T13:54:24.488273Z I i.q.c.h.p.JsonQueryProcessorState [69] execute-new [skip: 0, stop: 1000]
2023-01-02T13:54:25.131864Z I i.q.c.p.PGConnectionContext parse [fd=34, q=BEGIN]
2023-01-02T13:54:25.131890Z I i.q.c.p.PGConnectionContext parse [fd=34, q=SELECT 1]
2023-01-02T13:54:25.131892Z I i.q.c.p.PGConnectionContext query cache used [fd=34]
2023-01-02T13:54:25.132416Z I i.q.c.p.PGConnectionContext parse [fd=34, q=BEGIN]
2023-01-02T13:54:25.132419Z I i.q.c.p.PGConnectionContext parse [fd=34, q=SELECT 1]
2023-01-02T13:54:25.132420Z I i.q.c.p.PGConnectionContext query cache used [fd=34]
2023-01-02T13:54:25.374327Z I i.q.c.p.PGConnectionContext parse [fd=34, q=BEGIN]
2023-01-02T13:54:25.374353Z I i.q.c.p.PGConnectionContext parse [fd=34, q=SELECT 1]
2023-01-02T13:54:25.374354Z I i.q.c.p.PGConnectionContext query cache used [fd=34]
2023-01-02T13:54:25.746249Z C i.q.c.h.p.JsonQueryProcessorState [69] Uh-oh. Error!
java.lang.OutOfMemoryError: Java heap space
Andrey Pechkurov
01/02/2023, 1:59 PMjava.lang.OutOfMemoryError: Java heap space
RequestId
valuesAjay Pilaniya
01/02/2023, 2:00 PMAndrey Pechkurov
01/02/2023, 2:01 PMts
column?Ajay Pilaniya
01/02/2023, 2:07 PM2023-01-01T00:00:00.000000Z
ts value. Tommorrow there will be more data of 2023-01-02T00:00:00.000000Z
so I need to group by ts as I don’t want to mix up data of two dates. Also since I at least need data of any particular day I am unable to break down this query further.Andrey Pechkurov
01/02/2023, 2:29 PMts, ContentId, UserId
tupleSELECT ts, RequestContentId as ContentId, RequestUserId as UserId, count_distinct(RequestId) as TotalRequests FROM 'request_analytics_ENTERR10' where ts in '2022-01'
SELECT ts, RequestContentId as ContentId, RequestUserId as UserId, count_distinct(RequestId) as TotalRequests FROM 'request_analytics_ENTERR10' where ts in '2022-02'
...
ts
filter: I've used days just as an example hereAjay Pilaniya
01/02/2023, 2:34 PMYou should split your query into multiple ones:Currently I have only one day data i.e 2023-01 So even if I use filter it will still need to store all rows as currently all data belongs to 2023-01
The thing is that the database has to store 181k rows per each of the returned row, i.e. each uniqueEven if I use LIMIT 0,10 it still fails. Shouldn’t pagination help here?tuplets, ContentId, UserId
Andrey Pechkurov
01/02/2023, 2:37 PMCurrently I have only one day data i.e 2023-01You could use hour filters then
All request which landed today are floored to start of the dayIs it on purpose?
Ajay Pilaniya
01/02/2023, 2:43 PMIs it on purpose?Yes, because for each day we aggregate this data. At the end of any day we calculate data of past day and that’s why we need to query on whole DAY partition
Andrey Pechkurov
01/02/2023, 2:48 PMRequestUserId, RequestContentId
you have per day?Ajay Pilaniya
01/02/2023, 2:50 PMAndrey Pechkurov
01/02/2023, 3:10 PMAjay Pilaniya
01/02/2023, 3:30 PMso, 200K request ids x 50K user ids? that’s 10BNo, basically when ever we get a hit on a content it generates a unique request id. A user can generate multiple requests on mulitple contents. So basically let’s say UserA generates two Requests on ContentA.. And UserB generates four request on ContentB. Then I will have data like this
Request1, UserA, ContentA
Request2, UserA, ContentA
Request3, UserB, ContentB
Request4, UserB, ContentB
Request5, UserB, ContentB
Request6, UserB, ContentB
As you can see RequestId
will have unique values but UserId
and ContentId
will have repetitive values.
So my query only trying to find distinct request count on a User and Content combination like below -
UserA, ContentA - Distinct Request = 2
UserB, ContentB - Distinct Request = 4
So only two rows generated. So rows generated by this query will be unique pair of UserId + ContentId
which will definitely less than total rows in this table.
So for 181k rows table this combination will be less than 100k
@Andrey PechkurovAndrey Pechkurov
01/02/2023, 3:34 PMAjay Pilaniya
01/02/2023, 3:40 PMBut how many combinations do you currently have per day? I don’t think it’s 181kIt can never go beyond unique count of request ids which is 180k
Andrey Pechkurov
01/03/2023, 7:01 AMselect count() from (SELECT distinct RequestContentId, RequestUserId, RequestId FROM 'request_analytics_ENTERR10')
Ajay Pilaniya
01/03/2023, 8:42 AMCould you try to count distinct values in your table? Say with a query like this oneThis query returns 181k
Andrey Pechkurov
01/03/2023, 12:08 PMAjay Pilaniya
01/03/2023, 3:27 PMAndrey Pechkurov
01/03/2023, 3:32 PMAjay Pilaniya
01/03/2023, 3:33 PM2023-01-04T05:18:15.043223Z E i.q.c.l.t.LineTcpConnectionContext [261610] could not process line data [table=cdn_logs_RAJTV, msg=could not mmap [size=91392, offset=0, fd=261886, memUsed=181485689701, fileLen=1048576], errno=12]
Andrey Pechkurov
01/04/2023, 6:56 AMAjay Pilaniya
01/04/2023, 7:00 AMAndrey Pechkurov
01/04/2023, 7:01 AMAjay Pilaniya
01/04/2023, 7:02 AMAndrey Pechkurov
01/04/2023, 7:10 AM134217728
. Also, please update the max open file limit to the same valueDo I need to restart database process?Yes, you need to restart it
Ajay Pilaniya
01/04/2023, 7:11 AMLet’s set it to to a 512x value:Updated both.. Restarting the process now…. Also, please update the max open file limit to the same value134217728
Andrey Pechkurov
01/04/2023, 11:53 AMcdn_logs_ENTERR10
table are pretty consistent - they take around 16-20 seconds:
2023-01-04T12:37:14.619642Z I i.q.c.TableWriter o3 commit [table=cdn_logs_ENTERR10, maxUncommittedRows=500000, o3TimestampMin=2023-01-03T18:37:08.000000Z, o3TimestampMax=2023-01-04T12:28:16.000000Z, o3MaxLagUs=600000000, o3EffectiveLagUs=600000000, lagError=63670000000, o3SpreadUs=64268000000, lagThresholdTimestamp=2023-01-04T12:18:16.000000Z, o3LagRowCount=161304, srcOooMax=45391, o3RowCount=206695]
2023-01-04T12:37:14.754463Z I i.q.c.TableWriter sorted [table=cdn_logs_ENTERR10, o3RowCount=206695]
2023-01-04T12:37:14.754484Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=0, srcOooHi=6562, srcOooMax=45391, o3RowCount=206695, o3LagRowCount=161304, srcDataMax=2062592, o3TimestampMin=2023-01-03T18:37:08.000000Z, o3Timestamp=2023-01-03T18:37:08.000000Z, o3TimestampMax=2023-01-04T07:56:09.000000Z, partitionTimestamp=2023-01-03T18:00:00.000000Z, partitionIndex=264, partitionSize=2069155, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=1, flattenTimestamp=true, memUsed=65644850618]
2023-01-04T12:37:14.754742Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=6563, srcOooHi=28056, srcOooMax=45391, o3RowCount=206695, o3LagRowCount=161304, srcDataMax=1078712, o3TimestampMin=2023-01-03T18:37:08.000000Z, o3Timestamp=2023-01-04T05:12:38.000000Z, o3TimestampMax=2023-01-04T07:56:09.000000Z, partitionTimestamp=2023-01-04T05:00:00.000000Z, partitionIndex=308, partitionSize=1100206, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=2, flattenTimestamp=false, memUsed=65644850618]
2023-01-04T12:37:14.754755Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=28057, srcOooHi=45390, srcOooMax=45391, o3RowCount=206695, o3LagRowCount=161304, srcDataMax=891340, o3TimestampMin=2023-01-03T18:37:08.000000Z, o3Timestamp=2023-01-04T07:54:24.000000Z, o3TimestampMax=2023-01-04T07:56:09.000000Z, partitionTimestamp=2023-01-04T07:00:00.000000Z, partitionIndex=316, partitionSize=908674, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=3, flattenTimestamp=false, memUsed=65644850618]
2023-01-04T12:37:31.526618Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-03T18:00:00.000000Z, txn=3448]
2023-01-04T12:37:31.526629Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T05:00:00.000000Z, txn=3448]
2023-01-04T12:37:31.526631Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T07:00:00.000000Z, txn=3448]
2023-01-04T12:37:31.688634Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-03T18.3447]
2023-01-04T12:37:31.777367Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T05.3447]
2023-01-04T12:37:31.839572Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T07.3447]
2023-01-04T12:37:32.341342Z I i.q.c.TableWriter o3 commit [table=cdn_logs_ENTERR10, maxUncommittedRows=500000, o3TimestampMin=2023-01-03T18:37:44.000000Z, o3TimestampMax=2023-01-04T12:28:16.000000Z, o3MaxLagUs=600000000, o3EffectiveLagUs=600000000, lagError=63634000000, o3SpreadUs=64232000000, lagThresholdTimestamp=2023-01-04T12:18:16.000000Z, o3LagRowCount=161304, srcOooMax=51701, o3RowCount=213005]
2023-01-04T12:37:32.390143Z I i.q.c.TableWriter sorted [table=cdn_logs_ENTERR10, o3RowCount=213005]
2023-01-04T12:37:32.390154Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=0, srcOooHi=1248, srcOooMax=51701, o3RowCount=213005, o3LagRowCount=161304, srcDataMax=2069155, o3TimestampMin=2023-01-03T18:37:44.000000Z, o3Timestamp=2023-01-03T18:37:44.000000Z, o3TimestampMax=2023-01-04T07:56:24.000000Z, partitionTimestamp=2023-01-03T18:00:00.000000Z, partitionIndex=264, partitionSize=2070404, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=1, flattenTimestamp=true, memUsed=65989374589]
2023-01-04T12:37:32.390244Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=1249, srcOooHi=25825, srcOooMax=51701, o3RowCount=213005, o3LagRowCount=161304, srcDataMax=1100206, o3TimestampMin=2023-01-03T18:37:44.000000Z, o3Timestamp=2023-01-04T05:12:41.000000Z, o3TimestampMax=2023-01-04T07:56:24.000000Z, partitionTimestamp=2023-01-04T05:00:00.000000Z, partitionIndex=308, partitionSize=1124783, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=2, flattenTimestamp=false, memUsed=65989374589]
2023-01-04T12:37:32.390248Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=25826, srcOooHi=51700, srcOooMax=51701, o3RowCount=213005, o3LagRowCount=161304, srcDataMax=908674, o3TimestampMin=2023-01-03T18:37:44.000000Z, o3Timestamp=2023-01-04T07:54:36.000000Z, o3TimestampMax=2023-01-04T07:56:24.000000Z, partitionTimestamp=2023-01-04T07:00:00.000000Z, partitionIndex=316, partitionSize=934549, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=3, flattenTimestamp=false, memUsed=65989374589]
2023-01-04T12:37:48.002811Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-03T18:00:00.000000Z, txn=3449]
2023-01-04T12:37:48.002823Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T05:00:00.000000Z, txn=3449]
2023-01-04T12:37:48.002825Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T07:00:00.000000Z, txn=3449]
2023-01-04T12:37:48.024692Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-03T18.3448]
2023-01-04T12:37:48.025186Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T05.3448]
2023-01-04T12:37:48.025720Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T07.3448]
2023-01-04T12:37:48.529259Z I i.q.c.TableWriter o3 commit [table=cdn_logs_ENTERR10, maxUncommittedRows=500000, o3TimestampMin=2023-01-04T05:16:41.000000Z, o3TimestampMax=2023-01-04T12:28:16.000000Z, o3MaxLagUs=600000000, o3EffectiveLagUs=600000000, lagError=25297000000, o3SpreadUs=25895000000, lagThresholdTimestamp=2023-01-04T12:18:16.000000Z, o3LagRowCount=161304, srcOooMax=28759, o3RowCount=190063]
2023-01-04T12:37:48.588145Z I i.q.c.TableWriter sorted [table=cdn_logs_ENTERR10, o3RowCount=190063]
2023-01-04T12:37:48.588165Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=0, srcOooHi=12105, srcOooMax=28759, o3RowCount=190063, o3LagRowCount=161304, srcDataMax=1124783, o3TimestampMin=2023-01-04T05:16:41.000000Z, o3Timestamp=2023-01-04T05:16:41.000000Z, o3TimestampMax=2023-01-04T08:42:53.000000Z, partitionTimestamp=2023-01-04T05:00:00.000000Z, partitionIndex=308, partitionSize=1136889, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=1, flattenTimestamp=true, memUsed=65757715604]
2023-01-04T12:37:48.588259Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=12106, srcOooHi=26088, srcOooMax=28759, o3RowCount=190063, o3LagRowCount=161304, srcDataMax=934549, o3TimestampMin=2023-01-04T05:16:41.000000Z, o3Timestamp=2023-01-04T07:54:54.000000Z, o3TimestampMax=2023-01-04T08:42:53.000000Z, partitionTimestamp=2023-01-04T07:00:00.000000Z, partitionIndex=316, partitionSize=948532, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=2, flattenTimestamp=false, memUsed=65757715604]
2023-01-04T12:37:48.588265Z I i.q.c.TableWriter o3 partition task [table=cdn_logs_ENTERR10, srcOooLo=26089, srcOooHi=28758, srcOooMax=28759, o3RowCount=190063, o3LagRowCount=161304, srcDataMax=351048, o3TimestampMin=2023-01-04T05:16:41.000000Z, o3Timestamp=2023-01-04T08:39:05.000000Z, o3TimestampMax=2023-01-04T08:42:53.000000Z, partitionTimestamp=2023-01-04T08:00:00.000000Z, partitionIndex=320, partitionSize=353718, maxTimestamp=2023-01-04T12:18:18.000000Z, last=false, append=false, pCount=3, flattenTimestamp=false, memUsed=65757715604]
2023-01-04T12:37:57.686194Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T05:00:00.000000Z, txn=3450]
2023-01-04T12:37:57.686206Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T07:00:00.000000Z, txn=3450]
2023-01-04T12:37:57.686207Z I i.q.c.TableWriter merged partition [table=`cdn_logs_ENTERR10`, ts=2023-01-04T08:00:00.000000Z, txn=3450]
2023-01-04T12:37:57.700192Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T05.3449]
2023-01-04T12:37:57.700526Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T07.3449]
2023-01-04T12:37:57.700862Z I i.q.c.TableWriter purged [path=/root/.questdb/db/cdn_logs_ENTERR10/2023-01-04T08.2766]
totalWriteCounts
column stand for in your table? The number of sent rows? Also, how do you measure durationPretty
?cairo.o3.min.lag=120000
server config property and gather the logs once again? This setting should help with making commits less frequent.
It should be set in <qdb_root>/conf/server.conf
or via QDB_CAIRO_O3_MIN_LAG=120000
environment variableAjay Pilaniya
01/04/2023, 4:15 PMWhat doescolumn stand for in your table? The number of sent rows? Also, how do you measuretotalWriteCounts
?durationPretty
totalWriteCounts
= total rows sent by my Java app, duration is time elapsed between first row and last row of the batch. Sample code below
long startTime = System.currentTimeMillis();
try (Sender sender = .....)
{
sender.send(1st row), sender.send(2nd row)......
}
long endTime = System.currentTimeMillis()
long totalDuration = endTime - startTime;
Andrey Pechkurov
01/04/2023, 5:12 PMAjay Pilaniya
01/04/2023, 5:15 PM2023-01-06T04:19:24.222354Z E i.q.c.l.t.LineTcpConnectionContext [670] could not process line data [table=cdn_logs_ENTERR10, msg=Partition '2023-01-06T03.4978' does not exist in table 'cdn_logs_ENTERR10' directory. Run [ALTER TABLE cdn_logs_ENTERR10 DROP PARTITION LIST '2023-01-06T03.4978'] to repair the table or restore the partition directory., errno=0]^M
2023-01-06T04:19:24.222393Z E i.q.c.l.t.LineTcpConnectionContext [670] could not parse measurement, NONE at 1216, line (may be mangled due to partial parsing): 'cdn_logs_ENTERR10,ContentId=6332fb57abe8c9102e3719ee,RequestId=Unknown,JobId=E1KUOC0ENJS6P5.2022-12-22-09.add14fbb.gz,DistributionType=VOD,FileType=DASH_VIDEO_SEGMENT,Method=GET,CdnType=CloudFront,DistributionId=E1KUOC0ENJS6P5,StreamingPartnerId=Unknown,UserCity=Ranchi,UserCountry=India,UserDevice=Mobile,UserBrowser=-,UserDeviceBrand=Unknown,UserDeviceName=Unknown,UserId=Unknown,Agent=Special,AgentName=ENTERR10,UserOs=Unknown,OsClass=Mobile CdnRequestId="SUNA5XLg8VXgNZA07kivMD66-46EHW0EugzNHbDsnGeTiLOwPG2Frw==",UserAgent="enterr10/112(Mobile;Android30;realme;Realme%20RMX3231)ExoPlayerLib/2.16.0",Referer="",Domain="<http://enterr10-streams-msp.streamready.in|enterr10-streams-msp.streamready.in>",Ip="2401:4900:3842:2b12:1:2:f580:4621",Extras="",FileName="chunk_ctvideo_ridp0va0br2500000_cs4680000_w1160222018_mpd.m4s",CacheStat="Hit",Resolution="854x480",EdgeLocation="MAA50-C2",FilePath="/ENTERR10/EPISODE/6332fb57abe8c9102e3719ee/dash/854x480/chunk_ctvideo_ridp0va0br2500000_cs4680000_w1160222018_mpd.m4s",AgentVersion="112",AgentVersionName="ENTERR10 112",OsVersionName="Unknown ??",OsVersion="??",isValid=t,SegmentCount=1i,ViewCount=0i,FileSize=351796i,StatusCode=200i,ResponseTime=0.001,ua=1672998564216000t,ca=1672998564216000t 1671702572000000000'^M
Andrey Pechkurov
01/06/2023, 6:20 AMcdn_logs_ENTERR10
table before this error occurs? If so, it's probably the following bug. We're going to release the fix soon, but before that the workaround is to restart the database
https://github.com/questdb/questdb/pull/2864Ajay Pilaniya
01/06/2023, 6:26 AMAndrey Pechkurov
01/06/2023, 6:48 AMAjay Pilaniya
01/06/2023, 6:54 AMAndrey Pechkurov
01/06/2023, 6:57 AMAjay Pilaniya
01/06/2023, 7:04 AMAlex Pelagenko
01/06/2023, 9:35 AMAjay Pilaniya
01/07/2023, 5:02 AMSELECT count_distinct(UserId) FROM (SELECT timestamp_floor('d', ts) as ts, ContentId, UserId, UserDevice, StreamingPartnerId, UserDeviceBrand, RequestId, UserBrowser, UserOs, UserCountry, UserCity, sum(SegmentCount) as SegmentCount, sum(FileSize) as TotalFileSize FROM 'cdn_logs_ENTERR10' where ts IN '2023-01-01');
This query returns distinct user count = 20k
Query 2 -
SELECT timestamp_floor('d', ts) as ts, ContentId, UserId, UserDevice, StreamingPartnerId, UserDeviceBrand, RequestId, UserBrowser, UserOs, UserCountry, UserCity, sum(SegmentCount) as SegmentCount, sum(FileSize) as TotalFileSize FROM 'cdn_logs_ENTERR10' where ts IN '2023-01-01' ORDER BY SegmentCount DESC LIMIT 0,20000
Using Query 2 I insert this data in a second table. I am using pagination to fetch data in batches of 20k rows each time. Query 1 is returning total 321k rows while using pagination I am getting only 317k rows. That’s the first gap. Secondly data inserted using Query 2 only returns 4k distinct user count.
I feel like here pagination is not working properly and might return duplicate rows due to which all user ids are not present when using query 2 pagination.Andrey Pechkurov
01/07/2023, 7:46 AMsum(SegmentCount)
and sum(FileSize)
) in the select. Did you insert all pages or only the first one (LIMIT 0,20000
)?
A side note: group by queries are materialized fully and only then LIMIT applies. So, from the performance stand point the most efficient way to insert the data is to do something like this:
INSERT INTO my_new_table
SELECT timestamp_floor('d', ts) as ts, ContentId, UserId, UserDevice, StreamingPartnerId, UserDeviceBrand, RequestId, UserBrowser, UserOs, UserCountry, UserCity, sum(SegmentCount) as SegmentCount, sum(FileSize) as TotalFileSize FROM 'cdn_logs_ENTERR10' where ts IN '2023-01-01' ORDER BY SegmentCount DESC; -- note: you may want to sort the data via timestamps if my_new_table has a designated timestamp
Ajay Pilaniya
01/07/2023, 11:23 PMDid you insert all pages or only the first one (I inserted all pages BTW but somehow results were not correct)?LIMIT 0,20000
Andrey Pechkurov
01/09/2023, 7:35 AMAjay Pilaniya
01/09/2023, 7:43 AMAndrey Pechkurov
01/09/2023, 4:36 PMAjay Pilaniya
01/10/2023, 7:15 AMAndrey Pechkurov
01/10/2023, 7:18 AM