AWS Auto Failover using Keepalived on a Netfoundry Marketplace gateway

Introduction

! Please read the entire article before you begin deploying.

Keepalived is a routing software written in C. The main goal of the project is to provide simple and robust facilities for loadbalancing, high-availability and fail-over to Linux system and Linux based infrastructures.

See Introduction to Edge High Availability for more information about NetFoundry ingress and egress traffic protection options.

NetFoundry AWS gateways can use keepalived and the VRRP protocol to provide auto failover by way of VPC route table manipulation. The primary gateway instance owns the route table routes during normal operations. Should the primary fail, the backup instance will take over the VPC route tables in order to minimize or prevent service down time. 

This document covers the following topics:

 

Installing Keepalived

Keepalived is actively maintained in EPEL repositories of the CentOS7 distribution.  It can be installed by YUM on to each gateway. Keepalived stores its configuration in /etc/keepalived/keepalived.conf, and is unique to the MASTER and BACKUP gateway.  

Installing on NetFoundry gateway

Run the following to install the keepalived package and all the required dependencies using YUM:

> sudo yum install -y keepalived

 

Configuring Keepalived in AWS

As an example, we can introduce the following  topology:

Active/Passive Configuration(only supported configuration in AWS)

image2018-4-11_13-53-7.png

* Please note: The diagram is an example.  Netfoundry recommends deploying gateways in a private subnet for most configurations.

Configuring Keepalived on Linux 

1. Sample configuration of /etc/keepalived/keepalived.conf for MASTER (A GW S1)

! Configuration File for keepalived
global_defs {
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 31
priority 111
advert_int 1
unicast_src_ip 10.19.0.121
unicast_peer {
10.19.64.121
}
notify "/etc/keepalived/gw-master.sh"
}

2. Sample configuration of /etc/keepalived/keepalived.conf for BACKUP (A GW S2)

! Configuration File for keepalived
global_defs {
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 31
priority 100
advert_int 1
unicast_src_ip 10.19.64.121
unicast_peer {
10.19.0.121
}
notify "/etc/keepalived/gw-master.sh"
}

The main configuration difference in keepalived between the master and the backup router is the 'priority' setting. The master server should have a higher priority than the backup router (111 vs. 100).  AWS doesn't allow multicast protocols within the VPC, so the configurations need to be set so the master & backup are pointing to each other using unicast.

 

When a fail-over or fail-back event occurs, the notify script is runs on the node on which the event is triggered.  For instance, the fail-over will trigger on the backup node & the fail-back would trigger on the master once it's online again.

The notify script is used to manipulate the AWS VPC routing tables so all routes pointing to the master are updated to point to the backup.  The same mechanism is used in reverse if/when the master comes back online.

 3. Sample configuration of notify script.

#!/bin/bash
# set log output
exec >& /var/log/nat-master.log
# set env variables
export INSTANCE_ID=$(curl -fsq http://169.254.169.254/latest/meta-data/instance-id)
export ROUTE_TB_ID='rtb-xxxxxxxx'
REGION=$(curl -fsq http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/[a-z]$//')
OUT_FMT='text'
aws_cmd="/usr/bin/aws --region ${REGION} --output ${OUT_FMT}"
${aws_cmd} ec2 replace-route --route-table-id ${ROUTE_TB_ID} --destination-cidr-block x.x.x.x/x --instance-id ${INSTANCE_ID}

 * Please replace the ROUTE_TB_ID and "--destination-cider-block" with valid values.

4. Start & Enable keepalived

sudo systemctl start keepalived
sudo systemctl enable keepalived

 

5. IAM permissions

In order for the script to successfully update the routing table, you must provide IAM right for the EC2 instances to access the routing tables. Here is an example of policy that could be attached to an EC2 instance profile:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:ReplaceRoute"
],
"Resource": [
"*"
]
}
]
}

 

Automated deployment using Cloudformation

Netfoundry provides a simple way of deploying this setup in a more sustainable package(Cloudformation) The cloudformation template is enhanced to handle multiple routing tables. It will deploy both EC2 instances & setup all the needed IAM permissions for the fail-over.

 

Sign into the console: https://nfconsole.io/app-login

Create a new gateway in HA mode, the console will generate two registration keys.

Launch the Cloudformation template:

https://us-east-1.console.aws.amazon.com/cloudformation/#/stacks/create/review?templateURL=https://netfoundry-aws-quickstart.s3.amazonaws.com/production/nf-gw-ha.template

 

Provide the values to the template:

mceclip0.png

 

NetFoundry Parameters:

  • Registration keys:  Use the keys generated from the console
  • RouteTableToManage:  List of route tables id that the auto failover mechanism will update

AWS Parameters:

  • VPCIP: The VPC Id in which the gateways will be deployed
  • KeyName: The name of the public SSH Key to associate with the gateways
  • SubnetId: The subnet in which to place the gateway
  • SSHLocation: The SSH access control list, if needed.
  • PrivateIP:  This static IP addresses that will be assigned to the instances.

 

This deployment does not create routes in the routing tables.  In order for this ingress fail-over to work properly you must create routes manually & point them to the master gateway.   In the event of a fail-over, those routes will be updated to point to the backup.  Once the master is online again, the routes will revert back to the master.

 

 

 

 

 

 

 

 

 

 

 

 

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.