On-Prem - Ingress High Availability

High Availability

If a vCPE HA Gateway is to act as a "default gateway", the pair will use VRRP to float a VIP (Virtual IP) between them. The VIP is used as the default route. The gateways function in active/standby mode. During normal operation, the primary hosts the VIP. If it fails, the secondary takes over until the primary is recovered.


This guide walks you through a practical demonstration of configuring High Availability between customer-hosted edge routers.

Sample Architecture without HA



For this demonstration, we have considered setting up a network between an ER running on a VMWARE VM on-premise (Assuming it is the headquarters)  (ingress ER) and Azure South India that hosts the target application (egress ER). The NetFoundry-hosted edge routers are provisioned at Azure & AWS in India. 

The endpoint from headquarters is able to access the web application hosted at Azure. Currently, a route has been configured for the network, pointing to ingress ER IP at HQ.


Sample Architecture with HA


An HQCustomerER2 has been deployed in order to configure High Availability between ingress ERs at HQ.


Step 1. Provisioning of Customer-hosted Edge Routers ( On-prem ) :

Customer self-hosted Edge Routers (CERs) act as egress routers for endpoints / other CERs to reach the services terminated on the CER endpoint.

Create and Register CERs in a private cloud

Use the below deployment guides to provision a customer-hosted Edge Router into a branch office or a private cloud.


Create and Register CERs on AWS / Azure / OCI / any Public Cloud

Use the below deployment guides to provision a customer-hosted Edge Router into your AWS / Azure/ GCP/ OCI.



Step 2: Configure the AppWAN

The AppWAN defines the services that one or more client endpoints can access.

To know more about AppWAN go to Create and Manage AppWAN article on the Netfoundry support hub.

Add the HQcusomerER2 endpoint in an existing AppWAN to create access for the web application which is hosted on Azure


Note: The endpoint/service/edge router attribute will select all endpoints/services/edge routers with that specific attribute. The @ symbol is used to tag Individual endpoints/services/edge routers and the # symbol is used to tag a group of endpoints/services/edge routers.


Step 3: Install and Configure Keepalived

Keepalived provides frameworks with high availability. High availability is achieved by the Virtual Redundancy Routing Protocol (VRRP). VRRP is a fundamental brick for router failover. To know more about the keepalived, click here

Update and install Keepalived

update and install Keepalived on both ERs

  • Use the below command to update the ER
    $sudo apt-get update
  • Use the below command to install keepalived
    $sudo apt-get install keepalived


Configure IP forwarding and non-local binding

Configure IP forwarding and non-local binding on both ERs

  • To configure IP forwarding and non-local binding you need to switch to the root user using the below command
    $sudo su -
  • To enable the Keepalived service to forward network packets to the backend servers, you need to enable IP forwarding. Run this command on both servers
    # sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/' /etc/sysctl.conf
  • Similarly, you need to enable HAProxy and Keepalived to bind to non-local IP address, that is to bind to the failover IP address (Floating IP).
    # echo "net.ipv4.ip_nonlocal_bind=1" >> /etc/sysctl.conf
  • Reload sysctl settings
    # sysctl -p




Configure Keepalived

  • Check the network interface name and IP on both servers that will be part of HA setup
    $ip a


Edge Router Interface IP Address
HQcustomerER1 enp3s0
HQCustomerER2 enp3s0


The default configuration file for Keepalived should be /etc/keepalived/keepalived.conf.

However, this configuration is not created by default. Create the configuration with the content below.

  • Exit root shell
    # exit


Configure HQCustomerER1

  • Create a new conf file
    $ sudo nano /etc/keepalived/keepalived.conf
  • configure keepalived using the below script
vrrp_instance VI_1 {
interface enp3s0
state MASTER
virtual_router_id 51
priority 101
advert_int 2
unicast_src_ip #IP of this device
unicast_peer{ #IP of the peer device
authentication {
auth_type AH
auth_pass N3tF0undry
virtual_ipaddress { dev enp3s0 label enp3s0:vip


Configure HQCustomerER2

  • Create a new conf file
    $ sudo nano /etc/keepalived/keepalived.conf
  • configure keepalived using the below script
vrrp_instance VI_1 {
interface enp3s0
state BACKUP
virtual_router_id 51
priority 100
advert_int 2
unicast_src_ip #IP of this device
unicast_peer{ #IP of the peer device
authentication {
auth_type AH
auth_pass N3tF0undry
virtual_ipaddress { dev enp3s0 label enp3s0:vip


Start Keepalived

Start and enable keepalived on both routers

  • Start keepalived on both ERs using the below command
   $ sudo service keepalived start
  • Enable to run at system boot
   $ sudo systemctl enable --now keepalived
  • Use the below command to check the status
   $ sudo systemctl status keepalived
  • HQCustomerER1 keepalived status


  • HQCustomerER2 keepalived status



Check Virtual IPs

By default virtual IP will be assigned to the master server, In the case, the master gets down, it will automatically assign to the slave server.

  • Use the following command to show the assigned virtual IP on the interface.
    $ ip addr show enp3s0
  • HQCustomerER1 assigned VIP


  • HQCustomerER2 assigned VIP



Step 4: Change Route

Currently, a route has been configured for the network, pointing to ingress ER IP at HQ. You need to change the route for the network pointing to VIP of ingress ERs.

  • Use the below command to delete the route pointing to ER1
    >route delete
  • Add a route using the below command
    >route app -P mask 
  • Use the below command to verify the route
    >route print

Note: A route has been configured on the local PC in this demonstration. For wider networks, you can make changes in the route at Gateways/Router/L3 Switches based on the network architecture.

Step 5: Validate

Validate Failover

Shut down the master ER1 and check if IPs are automatically assigned to the Backup ER2.

  • Use the below command to shutdown the ER1 interface
    $ip link set enp3s0 down
  • To check the status of the ER2 interface use the below command
    $ip addr show enp3s0


You can see that the VIP is automatically assigned to backup ER2

Validate service Accessibility 

The application or server is accessed via a private hostname or address that is not reachable via the internet. The application is therefore dark to the outside world and reachable only within the NetFoundry network.




Ability to track loss of controller and/or fabric to trigger local switchover

This feature is available from OpenZiti v0.28.1 on. The corresponding CloudZiti version is v7.3.91. Each router must have the health check endpoint enabled. The customer hosted router registered with the NetFoundry provided registration app will have that enabled. To check, view the router config file.

ziggy@ziti-edge-router:~$ cat /opt/netfoundry/ziti/ziti-router/config.yml
v: 3

  interval: 10s
    timeout: 15s
    initialDelay: 15s
    minLinks: 1
  interval: 10s
initialDelay: 15s

Using this HC endpoint, one can check if a router lost communication with the controller/fabric and use that information to trigger switchover on the side of VRRP.

Note: The recommendation is to match the intervals between the script interval trigger and the health check endpoint interval listed in the configuration file shown in the previous section (i.e. highlighted in red). Perhaps slightly higher to allow for at least one health-check to be done, when VRRP checks again.

Here is sample configuration to add a script tracking to the VRRP configuration.

Note: Best practice is to run script under a user not root. The script section details need to be added before the main instance configuration,

global_defs {
      script_user ziggy ziggy
vrrp_script wan_check {
        script "/home/ziggy/erhchecker.pyz" # path to script
      interval 12
      fall 3 # times to wait to failover to standby, default 3
rise 6
# times to wait before clear the failure, default 2 x fall
      user ziggy ziggy

vrrp_instance VI_1 {

        state MASTER
      track_script {


Script Options

Options can be set as Environmental variables or passed through the command line. The recommended option is to use the environmental variables.

./erhchecker.pyz --help

-h, --help show this help message and exit
-c ROUTERCONFIGFILEPATH, --routerConfigFilePath ROUTERCONFIGFILEPATH Specify the edge router config file
-t SWITCHTIMEOUT, --switchTimeout SWITCHTIMEOUT Time to pass to allow for sessions drainage
-r NOTFLAGROUTERSFILEPATH, --noTFlagRoutersFilePath NOTFLAGROUTERSFILEPATH Specify yaml file containing list of router ids that have no-traversable flag set
-f LOGFILE, --logFile LOGFILE Specify the log file
-v, --version show program's version number and exit

 Environmental Variable names with default values: 

'ROUTER_CONFIG_FILE_PATH', '/opt/netfoundry/ziti/ziti-router/config.yml'
'LOG_FILE', ""
SWITCH_TIMEOUT is used to delay the switch over to the protection router, when the router has lost communication with the controller and there are at least 1 fabric links still active. This is  to provide more time to drain the active sessions  still terminated on the router during the controller communication failure. It is settable, so one can decide what is the suitable time to allow for those session to continue. One must remember that while the router can not communicate with the controller, the new sessions can not be established. 
NO_T_FLAG_ROUTERS_FILE_PATH is used to pass the router ids to the script that have the no-traversable flag set, i.e. NC Router used for the salt communication. If the script finds any the links associated with these routers, it will not include them in the link health lookup.  The link connected the router on the controller is eliminated by comparing its IP address provided part of the HC Details and the controller ip/fqdn provided in the router configuration file.
YML syntax of the file content must be as follows:
$ cat router_file.yml
 - { router id 1} i.e falWSf-KgH
- { router id 2}


Enable environment variables with VRRP

Add environment variables to the following file /etc/default/keepalived and restart the keepalived service.
Note: The log file needs to be pre-created and with write privileges to at least the user that is allowed to run the script as configured in the vrrp configuration
$ sudo cat /etc/default/keepalived
# Options to pass to keepalived
# DAEMON_ARGS are appended to the keepalived command-line

$ sudo systemctl restart keepalived.service
Was this article helpful?
1 out of 1 found this helpful



Please sign in to leave a comment.