Title
y

Yair

09/01/2022, 9:14 PM
Hey Guys, My machine crashed during data ingestion using the line protocol and now QDB won't even start. I get this error:
Exception in thread "main" io.questdb.cairo.CairoException: [-100] Invalid metadata at fd=1460. Metadata version does not match runtime version [expected=426, actual=0]
... Is there anything I can do? Or did I just lose the whole database..?
j

Jaromir Hamala

09/02/2022, 9:07 AM
Hello, can you share logs from both the crash and the start attempt?
y

Yair

09/02/2022, 11:05 AM
Also, not sure if it matters, I currently have 3 .lock files in the 'db' folder, for the 3 tables I was writing to when it crashed. My tables are partitioned by DAY. I was hoping I could just drop those last partitions and continue ingesting from there... Thanks!
j

Jaromir Hamala

09/02/2022, 12:03 PM
first of all - and this is very important - before doing anything else: backup your
db
directory - this way no matter what you do you can always go back to the current state. then I would try to remove the
telemetry_config
directory and start questdb. if it’s just this table damaged then questdb will recreate it again. that’s not a big deal. if there are other tables damaged then it’s going a bit more complicated. @Miguel Arregui is the expert on detaching partitions so he might know what’s the best way to detach a potentially damaged partition from a table. while questdb is not running. but perhaps it wont be needed at all.
m

Miguel Arregui

09/02/2022, 12:27 PM
hi!, from what I read, the database was in the middle of doing something and the machine crashed, leaving metadata files in a bad state. That error "Metadata version does not match runtime version [expected=426, actual=0]" confirms that _meta is corrupted. Do you have a backup of the table from where we could scrape the latest _meta file for the table?, perhaps we could try and use that file and see how it goes as a first attempt?.
y

Yair

09/02/2022, 1:03 PM
Thanks @Jaromir Hamala and @Miguel Arregui , you are correct, but the logs seems to indicate that the `db\telemetry_config\version.k`is corrupt, not a
_meta
file... Also, which
_meta
file are you referring to? The one in the telemetry folder? Or the ones in the tables folders? The database itself has around 5TB of data at the moment.. about 24 tables, only 3 were written to while it happened. I do not have backups for the table/database itself but can always re create it. Just a pain since it's a lot of time/data. Best solution would be to get rid of only last bad partitions and continue from there. Worse would be get rid of those 3 tables and keep the others. The worst would be to start with fresh db, not only because of the time lost, but also knowing it can happen again and there is no solution, that would be bad..
m

Miguel Arregui

09/02/2022, 3:31 PM
every table has a folder within the
db
folder. Inside the table folder you will see "global" files _meta _cv _txn ... and then you will see a folder per partition. I am referring to the global _meta file for the affected table. Indeed the logs suggest that. I took the message in red that you pasted above and I dug into the code. Please try to delete the telemetry table.
y

Yair

09/02/2022, 4:25 PM
Ok I was able to start the database by removing the
telemetry
,
telemetry_config
and
sys.column_versions_purge_log
folders. Checking the last records on the tables I was writing to, and comparing to the folders inside the
db
folder, I can see there are couple of days of extra on the disk, that I cannot see/query from the database. Should I just manually delete those folders and re-insert all records for those days? My tables are Partitioned by DAY so it's really individual daily folders.
m

Miguel Arregui

09/05/2022, 6:55 AM
yes, I that would fix the issue