Capacity Planning
Capacity Planning
Message Center's resource consumption is modest for typical deployments. This page covers the variables that have the largest impact on throughput and memory under load.
CPU and Memory Baselines
| Scenario | CPU | Memory |
|---|---|---|
| Idle (no requests) | ~5m | ~180 MB |
| Steady-state web traffic | ~50–150m | ~250–320 MB |
| Large file upload in progress | ~100–200m | ~250 MB (streaming — upload does not spike heap) |
| Peak: 10 concurrent campaign list pages | ~200–350m | ~300 MB |
Memory consumption is dominated by the MongoDB driver connection pool and Next.js SSR cache. The streaming upload path is O(chunk size) and does not accumulate file data in heap.
Recommended minimums (Kubernetes):
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Raise the memory limit to 1 Gi if you see OOM kills or if CORE_AGENT_CONNECTIONS is set above 64.
Core HTTP Client Tuning
CORE_AGENT_CONNECTIONS (default: 32)
Controls how many keep-alive connections the BFF maintains to Core. Each concurrent HTTP request to Core (including uploads) consumes one connection.
| Scenario | Recommended value |
|---|---|
| Low-traffic deployment (< 10 concurrent users) | 8 |
| Standard deployment | 32 (default) |
| High-throughput (frequent large campaign lists, heavy audit queries) | 64 |
| Core is rate-limiting or showing high load | Reduce to 8–16 |
Every additional connection holds ~2 KB of socket state. 32 connections ≈ 64 KB overhead — negligible.
CORE_BODY_TIMEOUT_MS (default: 60000)
The maximum idle time between response body chunks for non-upload calls. A value of 60 seconds is sufficient for all standard endpoints. Raise if:
- You see
[core-slow]warnings followed by timeouts for legitimate large responses - Core is deployed on a high-latency WAN link
CORE_MAX_RESPONSE_BYTES (default: 16 MB)
Hard cap on response body size. The default 16 MB covers all standard list endpoints. Admin fanout routes (/api/admin/aerospike) can request up to 64 MB per-call. Raise the default only if you see PAYLOAD_TOO_LARGE errors on standard endpoints.
Upload Sizing
Server-side cap (UPLOAD_MAX_BYTES, default: 1 GB)
The policy limit for recipient file uploads via POST /api/uploads. The streaming implementation keeps heap usage at O(chunk size) regardless of file size, so the cap is purely a policy choice.
Temp disk space
During upload, a copy of the file is written to $TMPDIR/core-admin-uploads/. Ensure the pod has at least 2× the maximum file size of free space on /tmp.
| Max file size | Min /tmp space |
|---|---|
| 50 MB (UI wizard default) | 500 MB |
| 500 MB | 1 GB |
| 1 GB | 2 GB |
In Kubernetes, /tmp is backed by an emptyDir volume by default (memory-backed or disk-backed depending on cluster config). Prefer disk-backed emptyDir for upload workloads:
volumes:
- name: tmp
emptyDir:
medium: "" # disk-backed (not Memory)
sizeLimit: 10Gi
Audit Retention
AUDIT_RETENTION_DAYS (default: 90)
The audit_logs collection grows by one document per auditable action. For a deployment with 50 active users performing 20 actions per day, this is ~90,000 documents per 90-day retention window.
| Active users | Actions/day | 90-day volume | Est. collection size |
|---|---|---|---|
| 10 | 100 | 9,000 docs | ~2 MB |
| 50 | 500 | 45,000 docs | ~15 MB |
| 200 | 2,000 | 180,000 docs | ~60 MB |
After changing AUDIT_RETENTION_DAYS, run make migrate to update the TTL index — the TTL change does not take effect without the migration.
MongoDB Index Sizing
The most frequently read indexes (campaigns and audit_logs) are sized for workspaces of up to 200,000 campaigns and 500,000 audit entries. Beyond these scales, consider:
- Separate MongoDB cluster per workspace (for very large tenants)
- Increasing
cursor.maxTimeMSindirectly viaCORE_BODY_TIMEOUT_MS(though this is a Core-side concern) - Index-only queries: all list queries in the DAOs are covered by the compound indexes created in migrations v1–v8
Horizontal Scaling
Message Center is stateless (session state is in the next-auth JWT cookie; the BFF service JWT is cached per-process). Running multiple replicas is safe with these caveats:
- Each replica maintains its own in-memory service JWT cache. On startup, all replicas will make one proxy login call simultaneously — this is harmless but expected.
- The audit fallback file (
$TMPDIR/...) is per-pod. If a pod dies with a non-empty fallback file before draining, those events are lost. UsePodDisruptionBudgetwithminAvailable: 1and graceful shutdown to minimize this risk.
# k8s/pdb.yaml (example)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: core-admin
spec:
minAvailable: 1
selector:
matchLabels:
app: core-admin
Next Steps
- Environment Variables Reference — full env var reference with defaults
- Security Hardening — production security configuration