Skip to content

WireGuard as the Backbone of SD-WAN Tunnels

Every Hopbox SD-WAN device establishes encrypted tunnels back to our hub infrastructure. These tunnels carry POS transactions, CCTV backhaul, management traffic, and DNS queries. The tunnel protocol is the foundation of the entire overlay network.

We evaluated IPsec, OpenVPN, and WireGuard. WireGuard won decisively. This post explains why, and how we manage WireGuard at the scale of 900+ sites.

IPsec is the enterprise standard. IKEv2 + ESP provides strong encryption, broad interoperability, and decades of battle-testing. It is also complex.

An IPsec tunnel involves:

  • IKE Phase 1: negotiate a security association (SA) for the control channel
  • IKE Phase 2: negotiate SAs for the data channel
  • Multiple cipher suites to configure and agree upon
  • Certificate or PSK authentication
  • NAT-T (NAT Traversal) encapsulation if either side is behind NAT

On an embedded OpenWrt device with limited CPU, IPsec’s encryption overhead is noticeable. The strongSwan userspace daemon handles IKE negotiation, and ESP processing happens in the kernel — but the handshake latency and rekeying overhead add up when you are managing hundreds of tunnels.

Terminal window
# IPsec connection establishment (typical)
# IKE_SA_INIT: ~50ms RTT
# IKE_AUTH: ~50ms RTT
# CREATE_CHILD_SA: ~50ms RTT
# Total setup: ~150-300ms (varies with network conditions)
#
# Plus: periodic rekeying every 1-8 hours depending on config

OpenVPN is the FOSS community’s default choice. It is well-understood, runs in userspace, and is easy to configure. The problem is performance.

OpenVPN processes every packet in userspace. On a 100 Mbps link, OpenVPN on a typical embedded device maxes out at 30-50 Mbps — the CPU becomes the bottleneck, copying packets between kernel and userspace and encrypting them in OpenSSL.

Terminal window
# OpenVPN throughput on typical embedded hardware
# (ARM Cortex-A53, quad-core, 1.2GHz)
#
# OpenVPN with AES-256-GCM: ~35 Mbps
# OpenVPN with ChaCha20: ~40 Mbps
# Native throughput (no VPN): ~940 Mbps
#
# OpenVPN uses ~80% CPU at these speeds

OpenVPN also has a slow handshake — TLS negotiation over TCP or UDP takes multiple round trips, and reconnection after a link flap is not instant.

WireGuard runs entirely in the kernel. There is no userspace daemon processing packets. The wg interface is a kernel network interface — packets go in, get encrypted by ChaCha20-Poly1305, and come out the other side. No context switches, no copies between userspace and kernel.

Terminal window
# WireGuard throughput on the same embedded hardware
# (ARM Cortex-A53, quad-core, 1.2GHz)
#
# WireGuard with ChaCha20-Poly1305: ~250 Mbps
# Native throughput (no VPN): ~940 Mbps
#
# WireGuard uses ~25% CPU at 250 Mbps

The numbers speak for themselves. WireGuard delivers 5-7x the throughput of OpenVPN on the same hardware, at a fraction of the CPU cost.

Covered above. On embedded devices where CPU is the constraint, kernel-space processing is not optional — it is necessary. Our devices need to push encrypted POS, CCTV, and management traffic simultaneously. Every CPU cycle spent on VPN overhead is a cycle not available for QoS processing, firewalling, or monitoring.

A WireGuard configuration is a handful of lines:

[Interface]
PrivateKey = <device_private_key>
Address = 10.200.42.2/32
MTU = 1420
[Peer]
PublicKey = <hub_public_key>
Endpoint = hub-west-01.hopbox.in:51820
AllowedIPs = 10.200.0.0/16, 10.100.0.0/16
PersistentKeepalive = 25

Compare that to an IPsec ipsec.conf or an OpenVPN .ovpn file. WireGuard’s configuration surface is small enough that we can generate, validate, and audit it programmatically with confidence.

WireGuard is stateless from the network perspective. There is no “connection” to establish or maintain. If a device’s WAN IP changes (common with 4G failover), WireGuard simply updates the endpoint when the next authenticated packet arrives. There is no renegotiation, no handshake — just a seamless transition.

This is critical for SD-WAN. When a device fails over from fiber to 4G, its public IP changes. With IPsec or OpenVPN, that means a reconnection delay. With WireGuard, the hub sees packets arriving from a new source IP, verifies the cryptographic authentication, and updates its endpoint record. Traffic continues to flow.

Terminal window
# WireGuard peer status showing roaming
root@hub-west-01:~# wg show wg0 peers
peer: <device_public_key>
endpoint: 49.36.xx.xx:34892 # <-- this updates automatically on failover
allowed ips: 10.200.42.2/32
latest handshake: 12 seconds ago
transfer: 1.24 GiB received, 892.45 MiB sent

WireGuard uses a fixed set of modern primitives: ChaCha20-Poly1305 for symmetric encryption, Curve25519 for key exchange, BLAKE2s for hashing, SipHash for hashtable keys. There is no cipher negotiation, no downgrade attacks, no configuration knob for choosing between AES-128, AES-256, 3DES, or RC4.

This is a feature, not a limitation. Cipher negotiation is a source of bugs and misconfiguration in IPsec and TLS. WireGuard’s approach means every tunnel uses the same strong cryptography — no exceptions.

OpenWrt has first-class WireGuard support via the kmod-wireguard kernel module and the wireguard-tools package.

Terminal window
# Install on OpenWrt
opkg update
opkg install kmod-wireguard wireguard-tools
# OpenWrt UCI configuration (equivalent to the wg config above)
uci set network.wg0=interface
uci set network.wg0.proto='wireguard'
uci set network.wg0.private_key='<device_private_key>'
uci set network.wg0.listen_port='51820'
uci add_list network.wg0.addresses='10.200.42.2/32'
uci set network.wg0.mtu='1420'
uci add network wireguard_wg0
uci set network.@wireguard_wg0[-1].public_key='<hub_public_key>'
uci set network.@wireguard_wg0[-1].endpoint_host='hub-west-01.hopbox.in'
uci set network.@wireguard_wg0[-1].endpoint_port='51820'
uci add_list network.@wireguard_wg0[-1].allowed_ips='10.200.0.0/16'
uci add_list network.@wireguard_wg0[-1].allowed_ips='10.100.0.0/16'
uci set network.@wireguard_wg0[-1].persistent_keepalive='25'
uci commit network
/etc/init.d/network reload

The UCI integration means WireGuard configuration is managed alongside all other network configuration on the device — same syntax, same commit/apply model, same Ansible modules.

WireGuard’s simplicity has one major operational implication: key management is your problem.

Each device has a unique Curve25519 key pair. Each hub has a key pair. Every device-to-hub relationship requires the hub to know the device’s public key, and the device to know the hub’s public key. At 900+ sites with multiple hubs, that is thousands of key pairs to generate, distribute, and rotate.

Keys are generated during device provisioning — never on the device itself (to avoid weak entropy on embedded hardware).

Terminal window
# Key generation in the provisioning pipeline
wg genkey | tee /tmp/device_private.key | wg pubkey > /tmp/device_public.key
# Private key goes into the device config (encrypted at rest)
# Public key goes into the hub config and the inventory database
roles/wireguard/tasks/main.yml
# Simplified Ansible template for key distribution
- name: Generate WireGuard keypair for new device
command: wg genkey
register: wg_private_key
delegate_to: localhost
no_log: true
- name: Derive public key
shell: "echo '{{ wg_private_key.stdout }}' | wg pubkey"
register: wg_public_key
delegate_to: localhost
no_log: true
- name: Store keys in vault
ansible.builtin.include_role:
name: vault_store
vars:
site_id: "{{ inventory_hostname }}"
private_key: "{{ wg_private_key.stdout }}"
public_key: "{{ wg_public_key.stdout }}"

WireGuard does not have built-in key rotation. The tunnel uses the configured key pair until you change it. For a security-conscious deployment, we rotate keys on a regular schedule:

  1. Provisioning server generates a new key pair for the device
  2. New public key is pushed to the hub configuration
  3. New private key is pushed to the device
  4. Device applies new config — tunnel re-establishes with new keys
  5. Old key is removed from the hub after confirming the new tunnel is active
Terminal window
# Key rotation: add new key to hub before removing old
# This ensures zero-downtime rotation
# Step 1: Add new peer entry on hub (new public key)
wg set wg0 peer <new_device_pubkey> allowed-ips 10.200.42.2/32
# Step 2: Push new private key to device, reload
# (via Ansible / management plane)
# Step 3: Verify tunnel is active with new key
wg show wg0 | grep -A4 <new_device_pubkey>
# latest handshake should be recent
# Step 4: Remove old peer entry from hub
wg set wg0 peer <old_device_pubkey> remove

Every WireGuard tunnel is monitored via Prometheus. The key metric is latest handshake — WireGuard performs a handshake every 2 minutes if there is traffic, or whenever a keepalive fires (every 25 seconds in our config).

Terminal window
# prometheus-wireguard-exporter metrics
wireguard_latest_handshake_seconds{interface="wg0",public_key="<key>"} 1711452823
wireguard_received_bytes_total{interface="wg0",public_key="<key>"} 1334567890
wireguard_sent_bytes_total{interface="wg0",public_key="<key>"} 987654321

Alert rule:

- alert: WireGuardTunnelDown
expr: time() - wireguard_latest_handshake_seconds > 300
for: 2m
labels:
severity: critical
annotations:
summary: "WireGuard tunnel to {{ $labels.public_key }} has not had a handshake in 5+ minutes"

If the latest handshake is more than 5 minutes old, the tunnel is effectively dead — the device is unreachable, likely due to a WAN outage or a deeper issue.

MTU misconfiguration is the most common cause of “tunnel is up but traffic is broken” issues. WireGuard adds overhead to every packet:

Outer UDP header: 8 bytes
WireGuard header: 32 bytes
Padding: 0-15 bytes
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total overhead: 40-47 bytes (typically 60 with outer IP header)

If the path MTU is the standard 1500 bytes, the inner MTU must be set to 1420 (1500 - 80 bytes of overhead) or lower. If the WAN link has a lower MTU (common with PPPoE at 1492), the inner MTU drops further.

Terminal window
# Set WireGuard interface MTU
ip link set wg0 mtu 1420
# For PPPoE uplinks (MTU 1492):
ip link set wg0 mtu 1412
# Verify: clamp TCP MSS to match
iptables -t mangle -A FORWARD -o wg0 -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu

We set MSS clamping on every device to avoid TCP blackhole issues where large packets are silently dropped because they exceed the path MTU and DF (Don’t Fragment) is set.

The tunnel endpoints are not a single server. Our hub infrastructure is distributed across regions:

┌──────────────┐
│ hub-north │
│ (Delhi) │
└──────┬───────┘
┌──────────────┐ │ ┌──────────────┐
│ hub-west │───────┼───────│ hub-east │
│ (Mumbai) │ │ │ (Kolkata) │
└──────────────┘ │ └──────────────┘
┌──────┴───────┐
│ hub-south │
│ (Chennai) │
└──────────────┘
Each hub: WireGuard endpoint + routing + PowerDNS resolver
Devices connect to nearest regional hub
Hubs mesh with each other for inter-region traffic

Each device connects to its nearest regional hub. If a hub goes down, devices can be redirected to an alternate hub — the management plane pushes a config update changing the WireGuard endpoint, and the device re-establishes the tunnel.

WireGuard gives us:

  • 5-7x throughput vs OpenVPN on the same embedded hardware
  • Instant roaming when devices fail over between WAN links
  • Minimal configuration surface — fewer moving parts, fewer misconfigurations
  • Kernel-space processing — no userspace bottleneck, predictable latency
  • Modern cryptography with no cipher negotiation or downgrade risk

The trade-off is that WireGuard does not handle key distribution, authentication infrastructure, or certificate management for you. We built that layer ourselves with Ansible and our provisioning pipeline. For a deployment at our scale, that trade-off is worth it — we would rather own the key management problem explicitly than wrestle with IPsec’s IKE complexity or OpenVPN’s TLS overhead.

WireGuard is not a complete SD-WAN solution. It is the tunnel primitive on top of which we build routing, failover, QoS, and monitoring. But it is the right primitive — fast, simple, and reliable.

v1.7.9