Data rate estimation and cache sizing

OIBus facilitates the transmission of values to a target application via North connectors such as OIBus and OIAnalytics, offering two distinct sending modes:

File Endpoint: Data can be sent by storing it in a file, which is then transferred via the files' endpoint.
JSON Payloads Endpoint: Alternatively, data can be transmitted as JSON payloads using the values' endpoint.

Estimating the appropriate cache size is essential to ensure smooth and reliable store-and-forward operations, and it depends on various factors including the type of data to be sent and the chosen sending mode. Here are some tips for estimating the cache size effectively:

Data Volume: Analyze the volume of data that your OIBus instance needs to transmit. Consider both the size of individual data elements and the frequency of data updates.
Sending Frequency: Evaluate how often data needs to be sent. Frequent transmissions may require larger caches to accommodate temporary disruptions in connectivity.
Transmission Mode: The chosen sending mode (file or JSON payloads) can impact cache size requirements. JSON payloads may consume more memory due to their structured format.
Network Reliability: If the network connection to the target application is unreliable or experiences frequent interruptions, a larger cache may be needed to store data during downtime.
Latency Tolerance: Consider the acceptable latency between data generation and data delivery to the target application. A larger cache can help mitigate delays caused by network issues.
Retention Policy: Determine how long data should be retained in the cache before attempting to resend it. This retention period will influence cache size requirements.
Resource Availability: Assess the available resources on the OIBus machine, including RAM and storage, as these will impact your ability to allocate a sufficiently sized cache.
Monitoring and Testing: Regularly monitor and test your cache system to ensure it meets the actual needs of your data transmission process. Adjust the cache size as necessary based on real-world performance.

By carefully considering these factors, you can make informed decisions about the cache size needed to support efficient and reliable data transmission from OIBus to your target application.

Sending files (CSV)

Some protocols like SQL can let you organize the data in the resulting file. Here are some tips to understand how it impacts the resulting file size.

Based on the provided assumptions, you can calculate the approximate space a CSV file generated by OIBus would occupy. Let's break down the key parameters and calculate the file size with the given examples:

Sampling Frequency: One point per minute, which means there are 60 data points per hour.
File Sending Frequency: One file sent every 30 minutes, resulting in 2 files per hour.
Timestamp Format: ISO 8601 format, 24 bytes in size.
Data Value Format: 3 digits with a separator for decimal places, making the data size 4 bytes.
Size of data References: Data references are in the format "DataXXX," where XXX represents three numeric characters. Therefore, each reference is 7 bytes in size.

Row files

This format is particularly suitable when the different data transmitted do not have the same sampling frequency. In the example we assume that all data has the same sample rate.

Row file CSV
Timestamp	                Reference	    Value
2020-02-01T20:04:00.000Z	Data001	        12.0
2020-02-01T20:04:00.000Z	Data002	        10.0
2020-02-01T20:04:00.000Z	Data003	        10.0
2020-02-01T20:05:00.000Z	Data001	        10.0
2020-02-01T20:05:00.000Z	Data002	        19.0
2020-02-01T20:05:00.000Z	Data003	        10.0
2020-02-01T20:06:00.000Z	Data001	        10.0
2020-02-01T20:06:00.000Z	Data002	        10.0
2020-02-01T20:06:00.000Z	Data003	        14.0
...

Now, let's calculate the size of each row file CSV: Timestamp (24 bytes) + Data Reference (7 bytes) + Data Value (4 bytes) + 3 separators (3 bytes) = 38 bytes per data point.

For 60 data points per hour, the data size per hour is: 38 bytes/data point * 60 data points = 2280 bytes per hour (header excluded).

With 2 files sent per hour, the hourly file size would be: 2 files/hour * 2100 bytes/file = 4560 bytes per hour.

Keep in mind that this calculation provides an estimate based on the specified assumptions, and actual file sizes may vary depending on additional factors such as the Data Value Format.

Column files

This format is especially well-suited for data that shares the same timestamp repeatedly, offering space savings compared to a format where each data point is placed on a separate line.

Column file CSV
Timestamp	                Data001	    Data002	    Data003
2020-02-01T20:04:00.000Z	12.0	    10.0	    10.0
2020-02-01T20:05:00.000Z	10.0	    19.0	    10.0
2020-02-01T20:06:00.000Z	10.0	    10.0	    14.0
...

Let's calculate the size of each column file CSV: Timestamp (24 bytes) + Data Value (4 bytes) * 3 + 4 separators (4 bytes) = 40 bytes per data point.

For 60 data points per hour, the data size per hour is: 40 bytes/data point * 60 data points = 2400 bytes per hour (header excluded).

With 2 files sent per hour, the hourly file size would be: 2 files/hour * 2400 bytes/file = 4800 bytes per hour.

Column row files

This format combines the benefits of a column-based file structure and allows for the consolidation of data identifiers (001, 002, 003) with their references, although in this case, only Data is utilized. This results in references like Data001, Data002, and Data003.

Column row file CSV
Timestamp	                Reference	001	    002	    003
2020-02-01T20:04:00.000Z	Data	    12.0	10.0	10.0
2020-02-01T20:05:00.000Z	Data	    10.0	19.0	10.0
2020-02-01T20:06:00.000Z	Data	    10.0	10.0	14.0
...

Let's calculate the size of each column file CSV: Timestamp (24 bytes) + Data Reference (4 bytes) + Data Value (4 bytes) * 3 + 5 separators (5 bytes) = 45 bytes per data point.

For 60 data points per hour, the data size per hour is: 45 bytes/data point * 60 data points = 2700 bytes per hour (header excluded).

With 2 files sent per hour, the hourly file size would be: 2 files/hour * 2700 bytes/file = 5400 bytes per hour.

Sending values (JSON payload)

Format

When the North connector retrieves values and transmits them to a values' endpoint (OIBus North Connector or OIAnalytics), they are presented in an array format as follows:

JSON payload
[
    {"timestamp": "2020-02-01T20:04:00.000Z", "pointId":"Data001", "data": {"value": "12.0", "quality": "192"}},
    {"timestamp": "2020-02-01T20:04:00.000Z", "pointId":"Data002", "data": {"value": "10.0", "quality": "192"}}, 
    {"timestamp": "2020-02-01T20:04:00.000Z", "pointId":"Data003", "data": {"value": "10.0", "quality": "192"}}
]

Each field conveys the following information:

timestamp: denotes the timestamp of the value in ISO 8601 format.
pointId: serves as a reference for the value.
data: a JSON object that encompasses the recorded value (value) and its quality (quality) or other fields.

Our primary focus will be on the data within the JSON file format. In this context, its size is contingent upon various parameters, including:

The data sampling frequency.
The number of points grouped together for transmission (as defined by Group Count).
The frequency of transmission (as defined by Send Interval).
The format of data and quality, specifically the number of characters used for precision.
The size of the data references.

Size estimation

It is possible to estimate the space occupied by a single value based on the following criteria:

The timestamp size is 39 bytes ("timestamp": "2020-02-01T20:00:00.000Z").
The pointId size takes the form of "pointId": "DataXXX", adding 13 bytes to the number of bytes in the reference (in this case, 7 bytes for "DataXXX").
The data field size is 10 bytes ("data": {...}), in addition to its content:
- The value field follows the format "value": "10.0", adding 11 bytes plus the variable number of bytes required to encode the value (in this case, 4 bytes).
- The quality field is of the form "quality": "192", adding 13 bytes plus the variable number of bytes needed to encode the quality (here, 3 bytes).

Therefore, the size of the object representing a value can be broken down as follows:

Constant object size: 39 + 13 + 10 + 11 + 13 + 6 = 92 bytes (6 corresponds to the separators between different elements, such as commas).
Size of the reference: 7 bytes
Size of the value: 4 bytes
Size of the quality: 3 bytes

The total size of a single object to be sent is therefore 106 bytes for a single value.

With a sampling frequency of 1 point per minute and 3 data points, a Group Count equal to 1000, and a Send Interval equal to 60000ms, OIBus will transmit a JSON every minute with 3 data points, totaling 318 bytes.

Over the course of one day, this would amount to 318 x 60 = 19080 bytes.

Sending files (CSV)​

Row files​

Column files​

Column row files​

Sending values (JSON payload)​

Format​

Size estimation​