Link Health Monitoring

📦v1.0.0📅2026-04-10🔄Updated 2026-04-28👤Admin Team
conceptssms-gateway-proxymonitoringhealthsmppfailover

Overview

The SMS Gateway Proxy continuously monitors the state of every configured SMPP link and automatically adjusts rate limits when connections go up or down. This mechanism ensures that the platform never attempts to send messages through a disconnected link and dynamically adapts pool throughput to the number of healthy connections.

No manual intervention is required — the system detects link failures and recoveries automatically.


How It Works

Health Check Cycle

The proxy checks the status of each physical SMPP link every 10 seconds by verifying the Enquire Link response from the SMSC. Each check results in one of two states:

  • Connected — the SMPP session is active and the SMSC is responding to Enquire Link requests.
  • Disconnected — the SMSC is not responding or the TCP connection has been lost.

The system only reacts to state changes — if a link remains in the same state between checks, no action is taken. This prevents unnecessary recalculations and log noise.

Automatic Rate Limit Adjustment

When a link's state changes, the proxy immediately adjusts rate limits:

Link goes down:

  1. The link's rate limit (RPS) and burst are set to zero — no messages will be routed to this link.
  2. If the link belongs to a pool, the pool's rate limit is recalculated as the sum of all remaining healthy links' rates.

Link recovers:

  1. The link's rate limit is restored to its configured value from smpp_rules.toml.
  2. The pool's rate limit is recalculated to include the recovered link.

Rate limits are always set to absolute values (not incremental deltas), which prevents accumulation of errors over multiple state changes.


Example Scenario

Consider a pool with two SMPP links:

Configuration:

  • Link 100: Rate = 50 RPS
  • Link 101: Rate = 50 RPS
  • Pool 1001: Total = 100 RPS (sum of links)

Sequence of events:

EventLink 100Link 101Pool 1001
Initial state — all healthy50 RPS50 RPS100 RPS
Link 100 goes down0 RPS50 RPS50 RPS
Link 101 also goes down0 RPS0 RPS0 RPS
Link 100 recovers50 RPS0 RPS50 RPS
Link 101 recovers50 RPS50 RPS100 RPS

When the pool rate drops to zero (all links down), the proxy will reject new messages to this pool with an error until at least one link recovers.


The behavior differs slightly depending on whether a link is part of a pool:

Standalone link (not in any pool):

  • When it goes down, its rate limit is set to zero. Messages targeted at this specific ESME ID will be rejected.
  • When it recovers, its configured rate is restored.

Pool member:

  • When it goes down, both the link's rate and the parent pool's rate are adjusted.
  • The pool continues to operate at reduced capacity through remaining healthy links.
  • Traffic is automatically redistributed among healthy members using the load balancer's weighted algorithm.

Via REST API

Check the status of all SMPP links:

curl http://localhost:8080/api/v1/dashboard/smpp-links

Check a specific link:

curl -X POST http://localhost:8080/api/v1/sms/check-esme \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"esme": 100}'

Response:

{
  "persist": true,
  "system_id": "smpp-client-1",
  "error": ""
}

A persist: false response indicates the link is currently disconnected.

Via Prometheus Metrics

The proxy exposes connection state metrics at the /metrics endpoint, which can be scraped by Prometheus and visualized in Grafana.

Via Dashboard

The /dashboard/stats endpoint provides real-time statistics including current RPS per link and per pool, which reflects the dynamic rate adjustments.


  1. Check the dashboard — verify which links are affected via /api/v1/dashboard/smpp-links.
  2. Review the SMSC side — the most common cause is a problem on the SMSC or network level. Verify that the remote SMSC is reachable on the configured esme_addr and esme_port.
  3. Check credentials — if the link was recently reconfigured, verify esme_systemid and esme_password in smpp_rules.toml.
  4. Wait for auto-recovery — the proxy will automatically reconnect and restore rate limits when the SMSC becomes available again. No restart is required.
  5. Monitor pool capacity — if the affected link is part of a pool, the pool continues to operate at reduced capacity. Ensure the remaining links can handle the traffic load.

For more diagnostic steps, see the Troubleshooting Guide.