High Availability
If a vCPE HA Gateway is to act as a "default gateway", the pair will use VRRP to float a VIP (Virtual IP) between them. The VIP is used as the default route. The gateways function in active/standby mode. During normal operation, the primary hosts the VIP. If it fails, the secondary takes over until the primary is recovered.
Introduction
This guide walks you through a practical demonstration of configuring High Availability between customer-hosted edge routers.
Sample Architecture without HA
For this demonstration, we have considered setting up a network between an ER running on a VMWARE VM on-premise (Assuming it is the headquarters) (ingress ER) and Azure South India that hosts the target application (egress ER). The NetFoundry-hosted edge routers are provisioned at Azure & AWS in India.
The endpoint from headquarters is able to access the web application hosted at Azure. Currently, a route has been configured for the network 10.90.1.0/24, pointing to ingress ER IP 192.168.158.151 at HQ.
Sample Architecture with HA
An HQCustomerER2 has been deployed in order to configure High Availability between ingress ERs at HQ.
Step 1. Provisioning of Customer-hosted Edge Routers ( On-prem ) :
Customer self-hosted Edge Routers (CERs) act as egress routers for endpoints / other CERs to reach the services terminated on the CER endpoint.
Create and Register CERs in a private cloud
Use the below deployment guides to provision a customer-hosted Edge Router into a branch office or a private cloud.
Create and Register CERs on AWS / Azure / OCI / any Public Cloud
Use the below deployment guides to provision a customer-hosted Edge Router into your AWS / Azure/ GCP/ OCI.
Step 2: Configure the AppWAN
The AppWAN defines the services that one or more client endpoints can access.
To know more about AppWAN go to Create and Manage AppWAN article on the Netfoundry support hub.
Add the HQcusomerER2 endpoint in an existing AppWAN to create access for the web application which is hosted on Azure
Note: The endpoint/service/edge router attribute will select all endpoints/services/edge routers with that specific attribute. The @ symbol is used to tag Individual endpoints/services/edge routers and the # symbol is used to tag a group of endpoints/services/edge routers.
Step 3: Install and Configure Keepalived
Keepalived provides frameworks with high availability. High availability is achieved by the Virtual Redundancy Routing Protocol (VRRP). VRRP is a fundamental brick for router failover. To know more about the keepalived, click here
Update and install Keepalived
update and install Keepalived on both ERs
- Use the below command to update the ER
$sudo apt-get update
- Use the below command to install keepalived
$sudo apt-get install keepalived
Configure IP forwarding and non-local binding
Configure IP forwarding and non-local binding on both ERs
- To configure IP forwarding and non-local binding you need to switch to the root user using the below command
$sudo su -
- To enable the Keepalived service to forward network packets to the backend servers, you need to enable IP forwarding. Run this command on both servers
# sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/' /etc/sysctl.conf
- Similarly, you need to enable HAProxy and Keepalived to bind to non-local IP address, that is to bind to the failover IP address (Floating IP).
# echo "net.ipv4.ip_nonlocal_bind=1" >> /etc/sysctl.conf
- Reload sysctl settings
# sysctl -p
Configure Keepalived
- Check the network interface name and IP on both servers that will be part of HA setup
$ip a
Edge Router | Interface | IP Address |
HQcustomerER1 | enp3s0 | 192.168.158.151 |
HQCustomerER2 | enp3s0 | 192.168.158.152 |
The default configuration file for Keepalived should be /etc/keepalived/keepalived.conf.
However, this configuration is not created by default. Create the configuration with the content below.
- Exit root shell
# exit
Configure HQCustomerER1
- Create a new conf file
$ sudo nano /etc/keepalived/keepalived.conf
- configure keepalived using the below script
vrrp_instance VI_1 {
interface enp3s0
state MASTER
virtual_router_id 51
priority 101
advert_int 2
unicast_src_ip 192.168.158.151 #IP of this device
unicast_peer{
192.168.158.152 #IP of the peer device
}
authentication {
auth_type AH
auth_pass N3tF0undry
}
virtual_ipaddress {
192.168.158.150 dev enp3s0 label enp3s0:vip
}
}
Configure HQCustomerER2
- Create a new conf file
$ sudo nano /etc/keepalived/keepalived.conf
- configure keepalived using the below script
vrrp_instance VI_1 {
interface enp3s0
state BACKUP
virtual_router_id 51
priority 100
advert_int 2
unicast_src_ip 192.168.158.152 #IP of this device
unicast_peer{
192.168.158.151 #IP of the peer device
}
authentication {
auth_type AH
auth_pass N3tF0undry
}
virtual_ipaddress {
192.168.158.150 dev enp3s0 label enp3s0:vip
}
}
Start Keepalived
Start and enable keepalived on both routers
- Start keepalived on both ERs using the below command
$ sudo service keepalived start
- Enable to run at system boot
$ sudo systemctl enable --now keepalived
- Use the below command to check the status
$ sudo systemctl status keepalived
- HQCustomerER1 keepalived status
- HQCustomerER2 keepalived status
Check Virtual IPs
By default virtual IP will be assigned to the master server, In the case, the master gets down, it will automatically assign to the slave server.
- Use the following command to show the assigned virtual IP on the interface.
$ ip addr show enp3s0
- HQCustomerER1 assigned VIP
- HQCustomerER2 assigned VIP
Step 4: Change Route
Currently, a route has been configured for the network 10.90.1.0/24, pointing to ingress ER IP 192.168.158.151 at HQ. You need to change the route for the network 10.90.1.0/24 pointing to VIP 192.168.158.150 of ingress ERs.
- Use the below command to delete the route pointing to ER1
>route delete 10.90.1.0
- Add a route using the below command
>route app -P 10.90.1.0 mask 255.255.255.0 192.168.158.150
- Use the below command to verify the route
>route print
Note: A route has been configured on the local PC in this demonstration. For wider networks, you can make changes in the route at Gateways/Router/L3 Switches based on the network architecture.
Step 5: Validate
Validate Failover
Shut down the master ER1 and check if IPs are automatically assigned to the Backup ER2.
- Use the below command to shutdown the ER1 interface
$ip link set enp3s0 down
- To check the status of the ER2 interface use the below command
$ip addr show enp3s0
You can see that the VIP is automatically assigned to backup ER2
Validate service Accessibility
The application or server is accessed via a private hostname or address that is not reachable via the internet. The application is therefore dark to the outside world and reachable only within the NetFoundry network.
Ability to track loss of controller and/or fabric to trigger local switchover
This feature is available from OpenZiti v0.28.1 on. The corresponding CloudZiti version is v7.3.91. Each router must have the health check endpoint enabled. The customer hosted router registered with the NetFoundry provided registration app will have that enabled. To check, view the router config file.
ziggy@ziti-edge-router:~$ cat /opt/netfoundry/ziti/ziti-router/config.yml
v: 3
...
healthChecks:
ctrlPingCheck:
interval: 10s
timeout: 15s
initialDelay: 15s
linkCheck:
minLinks: 1
interval: 10s
initialDelay: 15s
...
Using this HC endpoint, one can check if a router lost communication with the controller/fabric and use that information to trigger switchover on the side of VRRP.
Note: The recommendation is to match the intervals between the script interval trigger and the health check endpoint interval listed in the configuration file shown in the previous section (i.e. highlighted in red). Perhaps slightly higher to allow for at least one health-check to be done, when VRRP checks again.
Here is sample configuration to add a script tracking to the VRRP configuration.
Note: Best practice is to run script under a user not root. The script section details need to be added before the main instance configuration,
global_defs {
script_user ziggy ziggy
enable_script_security
}
vrrp_script wan_check {
script "/home/ziggy/erhchecker.pyz" # path to script
interval 12
fall 3 # times to wait to failover to standby, default 3
rise 6 # times to wait before clear the failure, default 2 x fall
user ziggy ziggy
}
vrrp_instance VI_1 {
state MASTER
...
track_script {
wan_check
}
...
}
Script Options
Options can be set as Environmental variables or passed through the command line. The recommended option is to use the environmental variables.
./erhchecker.pyz --help
usage: erhchecker.pyz [-h] [-c ROUTERCONFIGFILEPATH] [-t SWITCHTIMEOUT] [-r NOTFLAGROUTERSFILEPATH][-l {INFO,ERROR,WARNING,DEBUG,CRITICAL}] [-f LOGFILE] [-v]
options:
-h, --help show this help message and exit
-c ROUTERCONFIGFILEPATH, --routerConfigFilePath ROUTERCONFIGFILEPATH Specify the edge router config file
-t SWITCHTIMEOUT, --switchTimeout SWITCHTIMEOUT Time to pass to allow for sessions drainage
-r NOTFLAGROUTERSFILEPATH, --noTFlagRoutersFilePath NOTFLAGROUTERSFILEPATH Specify yaml file containing list of router ids that have no-traversable flag set
-l {INFO,ERROR,WARNING,DEBUG,CRITICAL}, --logLevel {INFO,ERROR,WARNING,DEBUG,CRITICAL} Set the logging level
-f LOGFILE, --logFile LOGFILE Specify the log file
-v, --version show program's version number and exit
Environmental Variable names with default values:
'ROUTER_CONFIG_FILE_PATH', '/opt/netfoundry/ziti/ziti-router/config.yml'
'SWITCH_TIMEOUT', 600
'NO_T_FLAG_ROUTERS_FILE_PATH', ""
'LOG_FILE', ""
'LOG_LEVEL', "INFO"
$ cat router_file.yml
routerIds:
- { router id 1} i.e falWSf-KgH
- { router id 2}
...
Enable environment variables with VRRP
$ sudo cat /etc/default/keepalived
# Options to pass to keepalived
NO_T_FLAG_ROUTERS_FILE_PATH="/home/ziggy/router_file.yml"
LOG_FILE="/var/log/cloudziti_healthchecks.log"
LOG_LEVEL="DEBUG"
SWITCH_TIMEOUT=120
# DAEMON_ARGS are appended to the keepalived command-line
DAEMON_ARGS="-D"
$ sudo systemctl restart keepalived.service
Comments
0 comments