Clustering

Phase 1. Clustering is production-capable and actively developed. Features like rolling upgrades and automated disaster recovery tooling are on the roadmap.

Overview

MuninnDB clustering lets you run multiple nodes together for high availability and replication. If the primary node goes down, a replica is automatically elected as the new primary in about 3.5 seconds — with no manual intervention required.

A cluster is made up of one Cortex (the leader) and one or more Lobes (replicas). The Cortex owns all writes and runs the cognitive workers. Lobes serve reads, accept writes and forward them to the Cortex, and participate in elections when the Cortex fails.

When to Use Clustering

High availability — you need automatic failover if a node crashes
Read scalability — distribute read traffic across multiple Lobes
Zero-downtime deploys — rolling restarts without dropping requests

For development or single-tenant workloads, a single node is simpler and fully capable. Clustering adds operational overhead — only add it when you need it.

Architecture

MuninnDB uses cognitive naming for cluster roles:

Role	What it does	Holds data?	Votes in elections?
Cortex	Primary leader. Owns all writes, runs cognitive workers, streams replication to Lobes.	Yes	Yes
Lobe	Replica. Serves reads, forwards writes to Cortex, participates in elections.	Yes	Yes
Sentinel	Quorum voter only. No data, no reads or writes — just keeps the vote count honest.	No	Yes
Observer	Read-only replica. Receives replication but cannot vote or accept writes.	Yes	No

Replication is push-based over persistent TCP using MBP (Muninn Binary Protocol) on port 8474. Batches are sent every 5ms or 100 entries — whichever comes first. Batches larger than 1KB are compressed with Zstd.

Network Requirements

Before configuring a cluster, make sure your network is set up correctly:

Port 8474 TCP must be open between all cluster nodes in both directions. This is the MBP replication port.
Each node's bind_addr must be an address that all other nodes can reach. Do not use 0.0.0.0 or localhost for bind_addr in a multi-node setup — use the node's actual LAN IP.
cluster_secret provides HMAC-based authentication between nodes. Connections from nodes with a mismatched secret are refused.

The replication port (8474) is not TLS-encrypted by default. Run your cluster inside a trusted private network or VPN. Do not expose port 8474 to the public internet.

Minimum Configurations

Configuration	Nodes	Quorum	Tolerates
3-node (recommended)	1 Cortex + 2 Lobes	2 / 3	1 node failure
2-node + Sentinel (budget)	1 Cortex + 1 Lobe + 1 Sentinel	2 / 3	1 node failure
5-node (production HA)	1 Cortex + 4 Lobes	3 / 5	2 node failures

The Sentinel configuration is useful when you only have two physical machines but still need a quorum voter. Run the Sentinel on a third low-cost host (a small VM is fine) — it holds no data but breaks ties during elections.

Configuration Reference

cluster:
  enabled: true
  node_id: "muninn-01"        # unique, stable across restarts
  bind_addr: "10.0.1.10:8474" # replication listen address
  seeds:                       # initial peers to contact at startup
    - "10.0.1.11:8474"
    - "10.0.1.12:8474"
  cluster_secret: "your-shared-secret"  # same on all nodes
  lease_ttl: 10               # seconds before a dead leader is evicted
  heartbeat_ms: 1000          # heartbeat interval

# Environment variables override muninn.yaml values
MUNINN_CLUSTER_ENABLED=true
MUNINN_CLUSTER_NODE_ID=muninn-01
MUNINN_CLUSTER_BIND_ADDR=10.0.1.10:8474
MUNINN_CLUSTER_SEEDS=10.0.1.11:8474,10.0.1.12:8474
MUNINN_CLUSTER_SECRET=your-shared-secret

Key	Env var	Default	Description
cluster.enabled	MUNINN_CLUSTER_ENABLED	false	Enable cluster mode
cluster.node_id	MUNINN_CLUSTER_NODE_ID	—	Unique node identifier, stable across restarts
cluster.bind_addr	MUNINN_CLUSTER_BIND_ADDR	:8474	Replication listen address (use LAN IP, not localhost)
cluster.seeds	MUNINN_CLUSTER_SEEDS	—	Comma-separated list of peer addresses to contact at startup
cluster.cluster_secret	MUNINN_CLUSTER_SECRET	—	Shared HMAC secret for node authentication
cluster.lease_ttl	—	10	Seconds before a non-responsive leader is evicted
cluster.heartbeat_ms	—	1000	Heartbeat interval in milliseconds

cluster_secret Security

The cluster_secret is a shared password that every node uses to authenticate peer connections. Treat it like a database password:

Don't commit it to version control. Store it in an environment variable or secrets manager and inject it at runtime.
All nodes must share the same secret. A node with a mismatched secret will be refused by the cluster.
Rotate it when decommissioning a node. Update the secret on all remaining nodes, then restart them one at a time.

bash — generate a strong secret

openssl rand -hex 32
# export MUNINN_CLUSTER_SECRET=$(openssl rand -hex 32)

Starting a Cluster

Here's a complete walkthrough for a 3-node cluster on 10.0.1.10, 10.0.1.11, and 10.0.1.12.

Step 1 — Write a config file on each node

Node 1 — 10.0.1.10

# /etc/muninn/muninn.yaml — Node 1
cluster:
  enabled: true
  node_id: "muninn-01"
  bind_addr: "10.0.1.10:8474"
  seeds:
    - "10.0.1.11:8474"
    - "10.0.1.12:8474"
  cluster_secret: "replace-with-a-strong-secret"

Node 2 — 10.0.1.11

# /etc/muninn/muninn.yaml — Node 2
cluster:
  enabled: true
  node_id: "muninn-02"
  bind_addr: "10.0.1.11:8474"
  seeds:
    - "10.0.1.10:8474"
    - "10.0.1.12:8474"
  cluster_secret: "replace-with-a-strong-secret"

Node 3 — 10.0.1.12

# /etc/muninn/muninn.yaml — Node 3
cluster:
  enabled: true
  node_id: "muninn-03"
  bind_addr: "10.0.1.12:8474"
  seeds:
    - "10.0.1.10:8474"
    - "10.0.1.11:8474"
  cluster_secret: "replace-with-a-strong-secret"

Step 2 — Start MuninnDB on each node

bash — run on each node

muninn start --config /etc/muninn/muninn.yaml

# [INFO] cluster mode enabled, node_id=muninn-01
# [INFO] connecting to seed 10.0.1.11:8474... connected
# [INFO] connecting to seed 10.0.1.12:8474... connected
# [INFO] participating in election, epoch=1
# [INFO] elected as cortex (received 2/3 votes)
# [INFO] MuninnDB ready (role=cortex, cluster_size=3)

Step 3 — Verify the cluster

bash

muninn cluster info --addr http://10.0.1.10:8475

# NODE ID     ROLE     LAST SEQ   LAG
# muninn-01   cortex   10,421     —
# muninn-02   lobe     10,421     0ms
# muninn-03   lobe     10,419     2ms

Cluster CLI Commands

Command	What it does
muninn cluster info	Show cluster state: current Cortex, epoch, fencing token, member list
muninn cluster status	Per-node health, replication lag, and last sequence number
muninn cluster failover --yes	Trigger a manual election (graceful handoff to a Lobe)
muninn cluster add-node	Print step-by-step instructions for joining a new node
muninn cluster remove-node	Print step-by-step instructions for safely removing a node
muninn cluster enable	Enable cluster mode on a running standalone node
muninn cluster disable	Disable cluster mode (node continues as standalone)

All commands accept --addr http://<host>:8475 to target a specific node. Defaults to http://localhost:8475.

Automatic Failover

MuninnDB uses a heartbeat-based protocol (MSP) to detect and recover from Cortex failures automatically. Here's the timeline when a Cortex crashes:

t = 0s Cortex crashes or becomes unreachable

t = 3s Lobes each detect 3 missed heartbeats — mark Cortex as SDOWN (subjectively down)

t = 3.1s Quorum of Lobes agree the Cortex is unreachable — declare ODOWN (objectively down)

t = 3.2s Election begins — Lobe with the highest sequence number (most up-to-date data) runs

t = 3.5s New Cortex elected and operational — writes resume

Total failover time is roughly 3.5 seconds. The election epoch prevents split-brain: any write accepted by the old Cortex after the epoch advances is fenced out.

To trigger a graceful manual election (e.g. before planned maintenance on the Cortex node):

# Trigger a manual election (graceful handoff)
muninn cluster failover --yes

curl -X POST http://localhost:8475/v1/replication/promote \
  -H "Authorization: Bearer mn_admin_..."

Replication & Consistency

Writes always go to the Cortex. Lobes forward any writes they receive to the Cortex asynchronously — you don't need to point every client at the Cortex, though doing so reduces one network hop.

Replication is asynchronous. Lobes are typically within a few milliseconds of the Cortex, but reads from a Lobe may return data that is very slightly behind. For use cases that require strict read-after-write consistency, direct reads to the Cortex.

Quorum write protection: If the Cortex loses contact with a quorum of Lobes for more than 5 seconds, it demotes itself to Lobe and stops accepting writes. This prevents a network-partitioned Cortex from forming a split-brain with the rest of the cluster.

Monitoring Cluster Health

muninn cluster info    # cluster state, current Cortex, epoch
muninn cluster status  # per-node health and replication lag

# Cluster info
curl http://localhost:8475/v1/cluster/info \
  -H "Authorization: Bearer mn_admin_..."

# Per-node status and replication lag
curl http://localhost:8475/v1/cluster/nodes \
  -H "Authorization: Bearer mn_admin_..."

# Health check (200 = healthy, 503 = degraded)
curl http://localhost:8475/v1/cluster/health

The /v1/cluster/health endpoint returns 200 when the cluster has quorum and all nodes are healthy. It returns 503 when the cluster is degraded (e.g., below quorum). Wire this into your load balancer or health check system.

Watch replication lag with muninn cluster status. Normal lag is under 10ms. Sustained lag over 100ms indicates network congestion or a struggling Lobe.

Adding & Removing Nodes

MuninnDB walks you through both operations interactively:

bash

# Get instructions for joining a new node
muninn cluster add-node

# Get instructions for cleanly removing a node
muninn cluster remove-node --node muninn-03

When a new Lobe joins, the Cortex streams its current Pebble snapshot to the new node (rate-limited to 100 MB/s to avoid starving live writes), then the Lobe catches up on the replication log tail. The Lobe is not counted in quorum until it's fully caught up.

Tip: For large vaults, the initial state transfer can take several minutes. Monitor progress with muninn cluster status — the new node's lag will drop toward zero as it catches up.

Disaster Recovery

Scenario 1 — Single node failure

This is handled automatically. Remaining nodes elect a new Cortex within ~3.5 seconds. No action required.

Scenario 2 — Full cluster loss

Restore from your most recent Pebble snapshot backup. Start a single node from the snapshot, verify it's healthy, then add the remaining nodes as fresh Lobes (they'll receive an initial state transfer from the recovered Cortex).

Scenario 3 — Corrupted replication state

Stop all nodes.
Identify the healthiest node (the one with the highest sequence number — check logs).
Start that node as a standalone instance and verify data integrity.
Re-add the other nodes as fresh Lobes.

Phase 1 note: Automated DR tooling and point-in-time recovery are on the roadmap. For now, the safest approach is regular Pebble snapshot backups with a tool like muninn backup (coming in Phase 2). See Observability for monitoring sequence numbers and node health.

Upgrading a Cluster

Phase 1 limitation: Rolling upgrades are not yet supported. Nodes in a cluster must run the same MuninnDB version.

Safe upgrade procedure for Phase 1:

Drain new writes to the cluster (or schedule a brief maintenance window).
Stop all Lobes first, then stop the Cortex.
Upgrade the binary on all nodes.
Start the Cortex, wait for it to be ready, then start the Lobes.
Verify with muninn cluster status — confirm all nodes are connected and replication lag is near zero.

Monitor replication lag throughout the restart sequence. If a Lobe shows sustained lag after rejoining, check for disk I/O saturation or network issues before allowing write traffic back.