Edge Router Health Checks


When operating customer controlled Edge Routers, it may be desirable to monitor the status of the router process in order to manage operations of the network.  Ziti Edge Routers provide a configurable REST endpoint to test the responsiveness of the process, as well as to gather some information that may be useful for various implementations.


The configuration of this endpoint is not enabled by default for customer images.  It is used internally to maintain monitoring on all NetFoundry hosted Edge Routers (and a similar operation on Controllers).  Enabling the endpoint, and subsequent monitoring is a relatively easy task.

To enable to endpoint on the Edge Router, the following block should be placed in the configuration file (and the router process restarted afterwards)  The comments can be left or removed, as preferred.

# name - required
# Provides a name for this listener, used for logging output. Not required to be unique, but is highly suggested.
- name: health-check
   # bindPoints - required
   # One or more bind points are required. A bind point specifies an interface (interface:port string) that defines
   # where on the host machine the webListener will listen and the address (host:port) that should be used to
   # publicly address the webListener(i.e. mydomain.com, localhost, This public address may be used for
   # incoming address resolution as well as used in responses in the API.
     #interface - required
     # A host:port string on which network interface to listen on. will listen on all interfaces
     - interface:1270.0.0.1:8081
       # address - required
       # The public address that external incoming requests will be able to resolve. Used in request processing and
       # response content that requires full host:port/path addresses.
   # apis - required
   # Allows one or more APIs to be bound to this webListener
     # binding - required
     # Specifies an API to bind to this webListener. Built-in APIs are
     #   - health-checks
     - binding: health-checks


In this case, the binding is to the loopback.  One can use an external interface, or to bind to all interfaces, but generally, the actual checks are performed on the node, with the results being emitted to some central point, so it is better to have the interface no be publicly available, as it requires no authentication to read, and can give out information about the node and the network.

Below is a sample output of a curl to the endpoint:

ubuntu@ip-172-31-29-93:~/.ziti/quickstart/ip-172-31-29-93$ curl -k
   "data": {
       "checks": [
               "details": null,
               "healthy": true,
               "id": "controllerPing",
               "lastCheckDuration": "3.858µs",
               "lastCheckTime": "2023-10-17T11:27:57Z"
               "details": [
                       "linkId": "aZDCYuV7mjdhBii0if9ba",
                       "destRouterId": "9zjBaSX3h2",
                       "latency": 3064393.125,
                       "addresses": {
                           "ack": {
                               "localAddr": "tcp:",
                               "remoteAddr": "tcp:"
                           "payload": {
                               "localAddr": "tcp:",
                               "remoteAddr": "tcp:3172.31.32.66:58188"
                       "linkId": "1a5g17aPFQBIx7ZtDCwcqf",
                       "destRouterId": "9TDBiHXwS",
                       "latency": 70075578,
                       "addresses": {
                           "ack": {
                               "localAddr": "tcp:",
                               "remoteAddr": "tcp:"
                           "payload": {
                               "localAddr": "tcp:",
                               "remoteAddr": "tcp:"
               "healthy": true,
               "id": "link.health",
               "lastCheckDuration": "63.537µs",
               "lastCheckTime": "2023-10-17T11:28:18Z"
       "healthy": true
   "meta": {}


As you can see the "healthy" boolean responds true, and the links as seen by the router are reported as well.  The link information can be leveraged for certain high availability configurations and other processes.


There are many ways to stream this data to a centralized system.  Wihtin NetFoundry, we use heartbeat, part of the Elastic ecosystem, to read the endpoint and send the results to our Elasticsearch system for monitoring, alerts, etc.



- type: http
name: "ziti-router-healthcheck"
schedule: '@every 60s'
timeout: 5s
verification_mode: none
urls: [""]
status: 200
- description: HealthCheck Overall Health
expression: 'data.healthy == true'
migration.6_to_7.enabled: false
name: "<NODE NAME>"
monitoring.enabled: false
- add_cloud_metadata:
fields_under_root: true
env: "production"
inventoryName: "<NODE NAME>"
networkId: "<NETWORK ID"
networkName: "<NETWORK NAME>"
# All Elasticsearch queries still depend on the old organizationId term for it's search
organizationId: "<ORGANIZATION ID>"
providerName: "<PROVIDER NAME>"
node_type: "ER"
# The Logstash hosts
hosts: ["logstash.<MY DOMAIN>"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
ssl.certificate_authorities: ["/etc/pki/beats/logstashCA.crt"]
ssl.certificate: "/etc/pki/beats/beats.crt"
ssl.key: "/etc/pki/beats/beats.key"
ssl.verification_mode: "certificate"


A more general option is Fluent, a similar software package to the Beats ecosystem.  In this example, the output is being written to stdio, which is redirected into the log file for Fluent.  Fluent can be configured with a multitude of targets via built in and plugin modules.


@type exec
   @type json
tag health-check
command curl -k
run_interval 5m

<match health-check>
@type stdout


There are many other methods to read and emit the health check status.  A Ziti service should generally not be used, as if the router is in a failed state, the service to send or retrieve the data is likely to be inoperable, and therefore provide limited value.



Was this article helpful?
0 out of 0 found this helpful



Please sign in to leave a comment.