Data rate estimation and cache sizing
OIBus sends values to a target application via North connectors (OIConnect, OIAnalytics...). There are two sending modes:
- through a file with a files' endpoint
- through JSON payloads with a values' endpoint.
The volumes to be taken into account can be estimated according to the data to be sent and the sending mode selected. These estimates can also be used to size the amount of cache storage needed to ensure the store and forward under good conditions.
This section gives some hints on how to estimate the cache size.
Sending files (CSV)
We will focus on data in the form of CSV files. In this case the volume will depend on several parameters:
- The data sampling frequency
- The file sending frequency
- The timestamp format
- The data format: number of characters used (precision)
- The size of data references
- The file format: in lines or in columns
In the following examples, we will calculate how much space a CSV file generated by OIBus takes. We took the following assumptions:
- The sampling frequency: one point per minute.
- The frequency of sending the file: one file every 30 minutes.
- The timestamp format: ISO 8601 format, 24 bytes in size.
- Data format: 3 digits with a separator for the decimal places. Therefore, the data in the following examples have a size of 4 bytes.
- The size of the point ID (data reference): DataXXX, where XXX represents three numbers characters. Therefore, the references of the following examples have a size of 7 bytes.
Column files
This format is particularly suitable for data repeated on the same timestamp. It saves space compared to a lines format.
Timestamp Data001 Data002 Data003
2020-02-01T20:04:00.000Z 12.0 10.0 10.0
2020-02-01T20:05:00.000Z 10.0 19.0 10.0
2020-02-01T20:06:00.000Z 10.0 10.0 14.0
...
The size of the header is 10 + 1 + 7 + 1 + 7 + 1 + 7 + 1 = 35 bytes
.
The size of one line is 24 + 1 + 4 + 1 + 4 + 1 + 4 + 1 = 40 bytes
(column separators and newlines are taken into
account).
The number of lines depends on the frequency of the data, here one line every minute. With a file sent every 30 minutes,
it will therefore have a size of 35+40x30 = 1235 bytes
. Over a day, there will be 48 files, a total of 59,280 bytes
or 58 kB.
Row files
This format is particularly suitable when the different data transmitted do not have the same sampling frequency. In the example we assume that all data has the same sample rate.
Timestamp Reference Value
2020-02-01T20:04:00.000Z Data001 12.0
2020-02-01T20:04:00.000Z Data002 10.0
2020-02-01T20:04:00.000Z Data003 10.0
2020-02-01T20:05:00.000Z Data001 10.0
2020-02-01T20:05:00.000Z Data002 19.0
2020-02-01T20:05:00.000Z Data003 10.0
2020-02-01T20:06:00.000Z Data001 10.0
2020-02-01T20:06:00.000Z Data002 10.0
2020-02-01T20:06:00.000Z Data003 14.0
...
The size of the header is 10 + 1 + 9 + 1 + 6 + 1 = 28 octets
.
The size of a line is 24 + 1 + 7 + 1 + 4 + 1 = 38 bytes
(column separators and newlines are taken into account).
The number of lines depends on the frequency of the data and the number of references, here one line every minute
multiplied by 3 references (which makes 3 lines per minute). With one file sent every 30 minutes, it will therefore have
a size of 28+38x30x3 = 3448 bytes
. Over a day, there will be 48 files, a total of 165,504 bytes or 162 kB.
Column row files
This format has the advantage of the column file and allows the pooling of data identifiers (001, 002, 003) with the references if there are several, which is not the case here since only Data is used. This allows you to obtain the references Data001, Data002, Data003.
Timestamp Reference 001 002 003
2020-02-01T20:04:00.000Z Data 12,0 10,0 10,0
2020-02-01T20:05:00.000Z Data 10.0 19.0 10.0
2020-02-01T20:06:00.000Z Data 10.0 10.0 14.0
...