Data Rate and Cache Sizing
OIBus uses a store-and-forward mechanism: when a North connector cannot reach its target, data is buffered in a local cache on disk. When the connection is restored, cached data is replayed in order. Sizing the cache correctly — large enough to survive expected outages without filling the disk — requires knowing the sustained data rate for each format.
This page explains how to estimate data volume per format and provides an interactive calculator to size the cache for your specific setup.
Transmission Formats
OIBus sends data in two modes, depending on the North connector:
| Mode | Description | Connectors |
|---|---|---|
| File endpoint | Data written to CSV files and pushed to a remote location | SFTP, Amazon S3, Azure Blob Storage, File Writer |
| JSON payload | Data serialised as a JSON array and POSTed over HTTP | OIAnalytics, REST API |
The choice of format has a large effect on throughput and cache size, particularly for high-frequency or multi-signal workloads.
CSV Format Estimations
Byte sizes below use the following representative field widths:
| Field | Example | Bytes |
|---|---|---|
| Timestamp | 2020-02-01T20:04:00.000Z | 24 |
| Point name | Data001 | 7 |
| Numeric value | 12.0 | 4 |
| Separator (comma or newline) | , / \n | 1 each |
1. Row Format
One data point per row.
Timestamp,Reference,Value
2020-02-01T20:04:00.000Z,Data001,12.0
Size per row (= per data point):
- Timestamp: 24 bytes
- Reference: 7 bytes
- Value: 4 bytes
- 2 commas + newline: 3 bytes
- Total: 38 bytes per data point
| Use when | Points have different timestamps, come from heterogeneous sources, or arrive at irregular intervals |
| Pros | Simplest structure; each row is self-contained; easy to stream or append; no schema needed to interpret a row |
| Cons | Most verbose CSV format — timestamp is repeated for every data point |
2. Column Format
K data points share one timestamp row. Point names appear in the header.
Timestamp,Data001,Data002,Data003
2020-02-01T20:04:00.000Z,12.0,10.0,10.0
Size per data row (K points per row):
- Timestamp: 24 bytes
- K values: K × 4 bytes
- K commas + newline: K + 1 bytes
- Total per row: (25 + 5K) bytes → (25 + 5K) / K bytes per data point
For K = 3: 40 bytes / row → ≈ 13.3 bytes per data point
| Use when | Multiple signals are sampled simultaneously at the same scan rate (e.g. a SCADA scan group) |
| Pros | Timestamp shared across K points — very space-efficient as K grows; natural fit for synchronous polling |
| Cons | All points in a row must share the same timestamp; schema lives in the header — harder to process without it |
3. Column-Row Format
Each row holds one timestamp, a shared reference name, and K values. Sub-IDs appear in the header.
Timestamp,Reference,001,002,003
2020-02-01T20:04:00.000Z,Data,12.0,10.0,10.0
Size per data row (K points per row, 4-byte reference):
- Timestamp: 24 bytes
- Reference: 4 bytes
- K values: K × 4 bytes
- (K + 1) commas + newline: K + 2 bytes
- Total per row: (30 + 5K) bytes → (30 + 5K) / K bytes per data point
For K = 3: 45 bytes / row → 15 bytes per data point
| Use when | Data has a natural grouping structure — e.g. the same measurement across multiple equipment instances, as used in historian connectors with group/reference arrays |
| Pros | Combines timestamp sharing with an extra grouping dimension; readable when the reference name is meaningful |
| Cons | Same timestamp constraint as Column; slightly less efficient due to the reference column; less widely supported by downstream tools |
JSON Payload Estimation
[
{
"timestamp": "2020-02-01T20:04:00.000Z",
"pointId": "Data001",
"value": "12.0"
}
]
Size breakdown (compact serialisation, no whitespace):
| Component | Content | Bytes |
|---|---|---|
timestamp field | "timestamp":"2020-02-01T20:04:00.000Z" | 38 |
pointId field | "pointId":"Data001" | 19 |
value field | "value":"12.0" | 14 |
| Structural characters | [{, 2× ,, }] | 6 |
| Total | 77 |
| Use when | Sending data to REST APIs or modern platforms that consume JSON natively |
| Pros | Self-describing; no schema knowledge required by the receiver; easy to consume in most languages and platforms |
| Cons | Most verbose format — roughly 2× larger than Row CSV and ~5.8× larger than Column CSV at K = 3; higher serialisation overhead |
Format Efficiency Summary
Bytes per data point at various K values (Column-Row assumes a 4-byte reference):
| Format | Formula | K = 1 | K = 3 | K = 10 |
|---|---|---|---|---|
| Row CSV | 38 | 38 | 38.0 | 38.0 |
| Column CSV | (25 + 5K) / K | 30 | 13.3 | 7.5 |
| Column-Row CSV | (30 + 5K) / K | 35 | 15.0 | 8.0 |
| JSON | 77 | 77 | 77.0 | 77.0 |
Column formats amortise the timestamp across K data points, becoming significantly more space-efficient as K grows. At K = 10, Column CSV uses 5× less space than Row CSV and 10× less than JSON.
Cache Sizing Calculator
Use the calculator below to estimate throughput and minimum cache size for your deployment.
Input guide:
| Input | What it represents |
|---|---|
| Data points / second | Your sustained collection rate across all active South connectors feeding this North connector |
| Points per row (K) | Number of signals sharing one timestamp row — only affects Column and Column-Row formats |
| Safety buffer (%) | Extra headroom for CSV header rows, partial records, measurement bursts, and rounding. 20–30% is typical. |
| Network overhead (%) | Additional bytes from the transport layer — TLS amortisation, HTTP chunking, SFTP packet framing. 20–50% is typical for HTTPS or SFTP. |
| Outage duration (hours) | The longest expected disconnection window. The cache must hold at least this much data before the connection is restored. |
The Cache for outage column shows the minimum value to configure for Maximum storage size in the North connector cache settings.
| Format | Bytes / point | Base throughput | Buffered throughput | Cache for 1 h outage |
|---|---|---|---|---|
| JSON | 77 | 77 B/s | 150 B/s | 527.9 KB |
| CSV Row | 38 | 38 B/s | 74 B/s | 260.5 KB |
| CSV Column (K=3) | 13.3 | 13 B/s | 26 B/s | 91.4 KB |
| CSV Column-Row (K=3) | 15 | 15 B/s | 29 B/s | 102.8 KB |
The Cache for outage column is the minimum value to set for Maximum storage size in North connector cache settings. K applies to Column formats only — increase it to see the efficiency gain from shared timestamps.
Applying the Result
In each North connector's Cache tab, set Maximum storage size to at least the value shown in the Cache for outage column for your chosen format. Add extra margin if:
- Outages in your environment tend to last longer than the estimate
- Multiple North connectors share the same disk volume
- The host also runs background exports, logging, or other disk-intensive processes
Each North connector maintains its own independent cache. If several connectors forward data simultaneously, size each one independently and verify the host has sufficient total disk capacity.
Effect of Compression
Some transport configurations support payload compression — for example, gzip over HTTPS between OIBus and a reverse proxy, or SFTP with compression enabled on the client. Compression typically reduces payload size by 60–80% for CSV and 70–85% for JSON, since both formats contain highly repetitive content (timestamps, field names, numeric strings).
Note that the OIBus cache stores data before compression. The cache size is therefore determined by the uncompressed rate shown in the calculator. Compression only reduces the bytes transmitted on the wire, not the disk space used by the cache.