When operating customer controlled Edge Routers, it may be desirable to monitor the status of the router process in order to manage operations of the network. Ziti Edge Routers provide a configurable REST endpoint to test the responsiveness of the process, as well as to gather some information that may be useful for various implementations.
The configuration of this endpoint is not enabled by default for customer images. It is used internally to maintain monitoring on all NetFoundry hosted Edge Routers (and a similar operation on Controllers). Enabling the endpoint, and subsequent monitoring is a relatively easy task.
To enable to endpoint on the Edge Router, the following block should be placed in the configuration file (and the router process restarted afterwards) The comments can be left or removed, as preferred.
web:
# name - required
# Provides a name for this listener, used for logging output. Not required to be unique, but is highly suggested.
- name: health-check
# bindPoints - required
# One or more bind points are required. A bind point specifies an interface (interface:port string) that defines
# where on the host machine the webListener will listen and the address (host:port) that should be used to
# publicly address the webListener(i.e. mydomain.com, localhost, 127.0.0.1). This public address may be used for
# incoming address resolution as well as used in responses in the API.
bindPoints:
#interface - required
# A host:port string on which network interface to listen on. 0.0.0.0 will listen on all interfaces
- interface:1270.0.0.1:8081
# address - required
# The public address that external incoming requests will be able to resolve. Used in request processing and
# response content that requires full host:port/path addresses.
address: 127.0.0.1:8081
# apis - required
# Allows one or more APIs to be bound to this webListener
apis:
# binding - required
# Specifies an API to bind to this webListener. Built-in APIs are
# - health-checks
- binding: health-checks
In this case, the binding is to the loopback. One can use an external interface, or 0.0.0.0 to bind to all interfaces, but generally, the actual checks are performed on the node, with the results being emitted to some central point, so it is better to have the interface no be publicly available, as it requires no authentication to read, and can give out information about the node and the network.
Below is a sample output of a curl to the endpoint:
ubuntu@ip-172-31-29-93:~/.ziti/quickstart/ip-172-31-29-93$ curl -k https://127.0.0.1:8081/health-checks
{
"data": {
"checks": [
{
"details": null,
"healthy": true,
"id": "controllerPing",
"lastCheckDuration": "3.858µs",
"lastCheckTime": "2023-10-17T11:27:57Z"
},
{
"details": [
{
"linkId": "aZDCYuV7mjdhBii0if9ba",
"destRouterId": "9zjBaSX3h2",
"latency": 3064393.125,
"addresses": {
"ack": {
"localAddr": "tcp:172.31.29.93:10080",
"remoteAddr": "tcp:172.31.32.66:58196"
},
"payload": {
"localAddr": "tcp:172.31.29.93:10080",
"remoteAddr": "tcp:3172.31.32.66:58188"
}
}
},
{
"linkId": "1a5g17aPFQBIx7ZtDCwcqf",
"destRouterId": "9TDBiHXwS",
"latency": 70075578,
"addresses": {
"ack": {
"localAddr": "tcp:172.31.29.93:10080",
"remoteAddr": "tcp:172.31.83.27:35474"
},
"payload": {
"localAddr": "tcp:172.31.29.93:10080",
"remoteAddr": "tcp:172.31.83.27:35468"
}
}
}
],
"healthy": true,
"id": "link.health",
"lastCheckDuration": "63.537µs",
"lastCheckTime": "2023-10-17T11:28:18Z"
}
],
"healthy": true
},
"meta": {}
As you can see the "healthy" boolean responds true, and the links as seen by the router are reported as well. The link information can be leveraged for certain high availability configurations and other processes.
There are many ways to stream this data to a centralized system. Wihtin NetFoundry, we use heartbeat, part of the Elastic ecosystem, to read the endpoint and send the results to our Elasticsearch system for monitoring, alerts, etc.
Example:
heartbeat.monitors:
- type: http
name: "ziti-router-healthcheck"
schedule: '@every 60s'
timeout: 5s
ssl:
verification_mode: none
urls: ["https://127.0.0.1:8081/health-checks"]
check.response:
status: 200
json:
- description: HealthCheck Overall Health
expression: 'data.healthy == true'
migration.6_to_7.enabled: false
name: "<NODE NAME>"
monitoring.enabled: false
processors:
- add_cloud_metadata:
fields_under_root: true
fields:
env: "production"
inventoryName: "<NODE NAME>"
networkId: "<NETWORK ID"
networkName: "<NETWORK NAME>"
# All Elasticsearch queries still depend on the old organizationId term for it's search
organizationId: "<ORGANIZATION ID>"
providerId: "<PROVIDER INSTANCE ID>"
providerName: "<PROVIDER NAME>"
node_type: "ER"
output.logstash:
# The Logstash hosts
hosts: ["logstash.<MY DOMAIN>"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
ssl.certificate_authorities: ["/etc/pki/beats/logstashCA.crt"]
ssl.certificate: "/etc/pki/beats/beats.crt"
ssl.key: "/etc/pki/beats/beats.key"
ssl.verification_mode: "certificate"
A more general option is Fluent, a similar software package to the Beats ecosystem. In this example, the output is being written to stdio, which is redirected into the log file for Fluent. Fluent can be configured with a multitude of targets via built in and plugin modules.
<source>
@type exec
<parse>
@type json
</parse>
tag health-check
command curl -k https://127.0.0.1:8081/health-checks
run_interval 5m
</source>
<match health-check>
@type stdout
</match>
There are many other methods to read and emit the health check status. A Ziti service should generally not be used, as if the router is in a failed state, the service to send or retrieve the data is likely to be inoperable, and therefore provide limited value.
Comments
0 comments