Skip to main content

Data Rate and Cache Sizing

OIBus uses a store-and-forward mechanism: when a North connector cannot reach its target, data is buffered in a local cache on disk. When the connection is restored, cached data is replayed in order. Sizing the cache correctly — large enough to survive expected outages without filling the disk — requires knowing the sustained data rate for each format.

This page explains how to estimate data volume per format and provides an interactive calculator to size the cache for your specific setup.

Transmission Formats

OIBus sends data in two modes, depending on the North connector:

ModeDescriptionConnectors
File endpointData written to CSV files and pushed to a remote locationSFTP, Amazon S3, Azure Blob Storage, File Writer
JSON payloadData serialised as a JSON array and POSTed over HTTPOIAnalytics, REST API

The choice of format has a large effect on throughput and cache size, particularly for high-frequency or multi-signal workloads.

CSV Format Estimations

Byte sizes below use the following representative field widths:

FieldExampleBytes
Timestamp2020-02-01T20:04:00.000Z24
Point nameData0017
Numeric value12.04
Separator (comma or newline), / \n1 each

1. Row Format

One data point per row.

Timestamp,Reference,Value
2020-02-01T20:04:00.000Z,Data001,12.0

Size per row (= per data point):

  • Timestamp: 24 bytes
  • Reference: 7 bytes
  • Value: 4 bytes
  • 2 commas + newline: 3 bytes
  • Total: 38 bytes per data point
Use whenPoints have different timestamps, come from heterogeneous sources, or arrive at irregular intervals
ProsSimplest structure; each row is self-contained; easy to stream or append; no schema needed to interpret a row
ConsMost verbose CSV format — timestamp is repeated for every data point

2. Column Format

K data points share one timestamp row. Point names appear in the header.

Timestamp,Data001,Data002,Data003
2020-02-01T20:04:00.000Z,12.0,10.0,10.0

Size per data row (K points per row):

  • Timestamp: 24 bytes
  • K values: K × 4 bytes
  • K commas + newline: K + 1 bytes
  • Total per row: (25 + 5K) bytes → (25 + 5K) / K bytes per data point

For K = 3: 40 bytes / row → ≈ 13.3 bytes per data point

Use whenMultiple signals are sampled simultaneously at the same scan rate (e.g. a SCADA scan group)
ProsTimestamp shared across K points — very space-efficient as K grows; natural fit for synchronous polling
ConsAll points in a row must share the same timestamp; schema lives in the header — harder to process without it

3. Column-Row Format

Each row holds one timestamp, a shared reference name, and K values. Sub-IDs appear in the header.

Timestamp,Reference,001,002,003
2020-02-01T20:04:00.000Z,Data,12.0,10.0,10.0

Size per data row (K points per row, 4-byte reference):

  • Timestamp: 24 bytes
  • Reference: 4 bytes
  • K values: K × 4 bytes
  • (K + 1) commas + newline: K + 2 bytes
  • Total per row: (30 + 5K) bytes → (30 + 5K) / K bytes per data point

For K = 3: 45 bytes / row → 15 bytes per data point

Use whenData has a natural grouping structure — e.g. the same measurement across multiple equipment instances, as used in historian connectors with group/reference arrays
ProsCombines timestamp sharing with an extra grouping dimension; readable when the reference name is meaningful
ConsSame timestamp constraint as Column; slightly less efficient due to the reference column; less widely supported by downstream tools

JSON Payload Estimation

[
{
"timestamp": "2020-02-01T20:04:00.000Z",
"pointId": "Data001",
"value": "12.0"
}
]

Size breakdown (compact serialisation, no whitespace):

ComponentContentBytes
timestamp field"timestamp":"2020-02-01T20:04:00.000Z"38
pointId field"pointId":"Data001"19
value field"value":"12.0"14
Structural characters[{, 2× ,, }]6
Total77
Use whenSending data to REST APIs or modern platforms that consume JSON natively
ProsSelf-describing; no schema knowledge required by the receiver; easy to consume in most languages and platforms
ConsMost verbose format — roughly 2× larger than Row CSV and ~5.8× larger than Column CSV at K = 3; higher serialisation overhead

Format Efficiency Summary

Bytes per data point at various K values (Column-Row assumes a 4-byte reference):

FormatFormulaK = 1K = 3K = 10
Row CSV383838.038.0
Column CSV(25 + 5K) / K3013.37.5
Column-Row CSV(30 + 5K) / K3515.08.0
JSON777777.077.0

Column formats amortise the timestamp across K data points, becoming significantly more space-efficient as K grows. At K = 10, Column CSV uses 5× less space than Row CSV and 10× less than JSON.

Cache Sizing Calculator

Use the calculator below to estimate throughput and minimum cache size for your deployment.

Input guide:

InputWhat it represents
Data points / secondYour sustained collection rate across all active South connectors feeding this North connector
Points per row (K)Number of signals sharing one timestamp row — only affects Column and Column-Row formats
Safety buffer (%)Extra headroom for CSV header rows, partial records, measurement bursts, and rounding. 20–30% is typical.
Network overhead (%)Additional bytes from the transport layer — TLS amortisation, HTTP chunking, SFTP packet framing. 20–50% is typical for HTTPS or SFTP.
Outage duration (hours)The longest expected disconnection window. The cache must hold at least this much data before the connection is restored.

The Cache for outage column shows the minimum value to configure for Maximum storage size in the North connector cache settings.

Combined overhead multiplier: ×1.950 — safety 30% × network overhead 50%
FormatBytes / pointBase throughputBuffered throughputCache for 1 h outage
JSON7777 B/s150 B/s527.9 KB
CSV Row3838 B/s74 B/s260.5 KB
CSV Column (K=3)13.313 B/s26 B/s91.4 KB
CSV Column-Row (K=3)1515 B/s29 B/s102.8 KB

The Cache for outage column is the minimum value to set for Maximum storage size in North connector cache settings. K applies to Column formats only — increase it to see the efficiency gain from shared timestamps.

Applying the Result

In each North connector's Cache tab, set Maximum storage size to at least the value shown in the Cache for outage column for your chosen format. Add extra margin if:

  • Outages in your environment tend to last longer than the estimate
  • Multiple North connectors share the same disk volume
  • The host also runs background exports, logging, or other disk-intensive processes
Multiple North connectors

Each North connector maintains its own independent cache. If several connectors forward data simultaneously, size each one independently and verify the host has sufficient total disk capacity.

Effect of Compression

Some transport configurations support payload compression — for example, gzip over HTTPS between OIBus and a reverse proxy, or SFTP with compression enabled on the client. Compression typically reduces payload size by 60–80% for CSV and 70–85% for JSON, since both formats contain highly repetitive content (timestamps, field names, numeric strings).

Note that the OIBus cache stores data before compression. The cache size is therefore determined by the uncompressed rate shown in the calculator. Compression only reduces the bytes transmitted on the wire, not the disk space used by the cache.