Link Health Monitoring
Overview
The SMS Gateway Proxy continuously monitors the state of every configured SMPP link and automatically adjusts rate limits when connections go up or down. This mechanism ensures that the platform never attempts to send messages through a disconnected link and dynamically adapts pool throughput to the number of healthy connections.
No manual intervention is required — the system detects link failures and recoveries automatically.
How It Works
Health Check Cycle
The proxy checks the status of each physical SMPP link every 10 seconds by verifying the Enquire Link response from the SMSC. Each check results in one of two states:
- Connected — the SMPP session is active and the SMSC is responding to Enquire Link requests.
- Disconnected — the SMSC is not responding or the TCP connection has been lost.
The system only reacts to state changes — if a link remains in the same state between checks, no action is taken. This prevents unnecessary recalculations and log noise.
Automatic Rate Limit Adjustment
When a link's state changes, the proxy immediately adjusts rate limits:
Link goes down:
- The link's rate limit (RPS) and burst are set to zero — no messages will be routed to this link.
- If the link belongs to a pool, the pool's rate limit is recalculated as the sum of all remaining healthy links' rates.
Link recovers:
- The link's rate limit is restored to its configured value from
smpp_rules.toml. - The pool's rate limit is recalculated to include the recovered link.
Rate limits are always set to absolute values (not incremental deltas), which prevents accumulation of errors over multiple state changes.
Example Scenario
Consider a pool with two SMPP links:
Configuration:
- Link 100: Rate = 50 RPS
- Link 101: Rate = 50 RPS
- Pool 1001: Total = 100 RPS (sum of links)
Sequence of events:
| Event | Link 100 | Link 101 | Pool 1001 |
|---|---|---|---|
| Initial state — all healthy | 50 RPS | 50 RPS | 100 RPS |
| Link 100 goes down | 0 RPS | 50 RPS | 50 RPS |
| Link 101 also goes down | 0 RPS | 0 RPS | 0 RPS |
| Link 100 recovers | 50 RPS | 0 RPS | 50 RPS |
| Link 101 recovers | 50 RPS | 50 RPS | 100 RPS |
When the pool rate drops to zero (all links down), the proxy will reject new messages to this pool with an error until at least one link recovers.
Standalone Links vs. Pool Members
The behavior differs slightly depending on whether a link is part of a pool:
Standalone link (not in any pool):
- When it goes down, its rate limit is set to zero. Messages targeted at this specific ESME ID will be rejected.
- When it recovers, its configured rate is restored.
Pool member:
- When it goes down, both the link's rate and the parent pool's rate are adjusted.
- The pool continues to operate at reduced capacity through remaining healthy links.
- Traffic is automatically redistributed among healthy members using the load balancer's weighted algorithm.
Monitoring Link Health
Via REST API
Check the status of all SMPP links:
curl http://localhost:8080/api/v1/dashboard/smpp-links
Check a specific link:
curl -X POST http://localhost:8080/api/v1/sms/check-esme \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"esme": 100}'
Response:
{
"persist": true,
"system_id": "smpp-client-1",
"error": ""
}
A persist: false response indicates the link is currently disconnected.
Via Prometheus Metrics
The proxy exposes connection state metrics at the /metrics endpoint, which can be scraped by Prometheus and visualized in Grafana.
Via Dashboard
The /dashboard/stats endpoint provides real-time statistics including current RPS per link and per pool, which reflects the dynamic rate adjustments.
What to Do When a Link Goes Down
- Check the dashboard — verify which links are affected via
/api/v1/dashboard/smpp-links. - Review the SMSC side — the most common cause is a problem on the SMSC or network level. Verify that the remote SMSC is reachable on the configured
esme_addrandesme_port. - Check credentials — if the link was recently reconfigured, verify
esme_systemidandesme_passwordinsmpp_rules.toml. - Wait for auto-recovery — the proxy will automatically reconnect and restore rate limits when the SMSC becomes available again. No restart is required.
- Monitor pool capacity — if the affected link is part of a pool, the pool continues to operate at reduced capacity. Ensure the remaining links can handle the traffic load.
For more diagnostic steps, see the Troubleshooting Guide.