https://questdb.io logo
Title
s

S Berder

04/20/2023, 7:29 AM
I'm trying to figure out how to best approach my use case in questdb. I'm currently running a tsdb of IoT environment devices in influxdb. The problem is that each monitor could potentially have different measurement fields. So I could have devices A and B measure x, y, z and device C measuring p, q, z. In influxdb, I create a measurement per device with the device unique ID in the name of the measurement to prevent cardinality issues. so I end up with measurements called 'node:A' and 'node:B' which allows each to have their own sets of "columns". How would you approach this in questdb? (also asked here https://www.reddit.com/r/questdb/comments/12spzbo/iot_data_with_variable_fields/)
s

Shan Desai

04/20/2023, 7:46 AM
I think measurements can be analogous to tables in questdb.
b

Bolek Ziobrowski

04/20/2023, 8:31 AM
@S Berder I think it depends on the number of such devices . If there's tens or hundreds it should be fine storing them in separate tables but if it thousands and more then it could be better to merge them into fewer tables and use a column to differentiate the various types. QuestDB does use space for nulls so to save space it'd make sense to make fields 'generic' and try to share as many as possible between record types.
j

javier ramirez

04/20/2023, 8:48 AM
A note that, regarding cardinality, we don’t have the performance issues you’ve seen on Influx. You could potentially have a single table like this one
create table device_metrics (
  device_id SYMBOL,
  x int,
  y double,
  z int,
  p int,
  q double,
  timestamp timestamp
) timestamp (timestamp) PARTITION BY DAY WAL;
And that should be performant enough. The only issue is that the not used columns for each entry would take some space because of how we treat NULLs, so depending on your data shape it might make sense to go with Bolek’s recommendation. Since we also support joins you could even do something like
create table device_core_metrics (
  device_id SYMBOL,
  z int,
  timestamp timestamp
) timestamp (timestamp) PARTITION BY DAY WAL;

create table device_x_metrics (
  device_id SYMBOL,
  x int,
  y double,
  timestamp timestamp
) timestamp (timestamp) PARTITION BY DAY WAL;

create table device_p_metrics (
  device_id SYMBOL,
  p int,
  q double,
  timestamp timestamp
) timestamp (timestamp) PARTITION BY DAY WAL;
And then insert for each device an entry into core and one into its specific type. If you ever need to join you could do ASOF join ON(device_id) between the three tables. Performance wouldn’t be as good as a single table, but it should work. Just tried it out
INSERT INTO device_core_metrics values ('A',1,'2023-04-20T10:45:00')
INSERT INTO device_x_metrics values ('A',1,1,'2023-04-20T10:45:00')

INSERT INTO device_core_metrics values ('B',1,'2023-04-20T10:46:00')
INSERT INTO device_p_metrics values ('B',1,1,'2023-04-20T10:46:00')

select * from device_core_metrics ASOF JOIN device_x_metrics ON (device_id) ASOF JOIN device_p_metrics ON (device_id)
and of course if I have a device with data in all tables, the join would populate every metric/column, as in
INSERT INTO device_core_metrics values ('C',3,'2023-04-20T10:47:00')
INSERT INTO device_x_metrics values ('C',3,3,'2023-04-20T10:47:00')
INSERT INTO device_p_metrics values ('C',3,3,'2023-04-20T10:47:00')

select * from device_core_metrics ASOF JOIN device_x_metrics ON (device_id) ASOF JOIN device_p_metrics ON (device_id)
image.png
h

Holger

04/20/2023, 10:51 AM
Or flatten the table and just use: IIOT id Sub type x,y,z,p,q Value Timestamp
j

javier ramirez

04/20/2023, 11:01 AM
That last one would make a bit trickier to have all the values for a device and timestamp on a single row, but still doable (if needed) using SQL functions
s

S Berder

04/20/2023, 12:53 PM
hmmm
the biggest issue is that I have quite a bit of variability
but not infinite possibilities
the space for one particular type of monitors is about 10/15 possible fields
a

Alex Pelagenko

04/20/2023, 1:48 PM
How would you model it in a relational database? Would you create many tables, one table with all the column or it’s possible to create something in the middle - e.g. a table per group of similar sensors? Too many tables (thousands) are not efficient in QuestDB, too many empty columns are not efficient either
QuestDB will be most efficient if you can limit number of active tables to hundreds and number of columns in each of them to hundreds too
h

Henri Asseily

04/20/2023, 5:33 PM
That’s where a pure columnar db shines. But it doesn’t get the timeseries advantages.
s

S Berder

04/21/2023, 3:07 AM
That's what I read about the space efficiency of tables in questdb so I can't apply the same logic I have in influxdb with one table/measurement/series per node. With table per node I don't have to worry about columns. I could create a table per type of environment monitoring (air/water/energy) and in that subspace the space of possible columns is limited (15 max I'd say). the drawback of that one is that some columns are used by only a few nodes so I'd create a lot of null values for most sensors. I'm trying to not be too clever with the joins and other tricks because that will make the code more complex.