AuditRubric
pr-ir-4 high Protect / Technology Infrastructure Resilience

Adequate resource capacity to ensure availability is maintained

Systems that run near capacity are fragile: a traffic spike, a DDoS attack, or an infrastructure event can push them into unavailability. Capacity management is a proactive discipline: maintaining enough headroom so that the system can absorb unexpected load and degrade gracefully rather than failing completely. This is both a reliability and a security concern, since availability is one of the three pillars of the CIA triad.

Estimated effort: 4h
capacityautoscalingddos-protectionavailabilityload-balancing

Implementation steps

  1. 1

    Monitor resource utilization across critical systems

    Track CPU, memory, storage, network bandwidth, and request throughput for critical services. Set alert thresholds that fire before reaching capacity limits, giving you time to respond before performance degrades. Review utilization trends monthly to anticipate growth-driven capacity needs.

    datadogaws-cloudwatchprometheusgrafananew-relic
  2. 2

    Implement auto-scaling for cloud-hosted services

    Configure auto-scaling policies for compute and database resources so that capacity expands automatically in response to demand. Define scaling policies based on CPU and memory thresholds or request rate. Test auto-scaling behavior under load to confirm it works as expected before a production traffic spike triggers it.

    awsgcpazurekubernetes
  3. 3

    Implement DDoS protection for public-facing services

    Deploy DDoS mitigation in front of public-facing services: a CDN or cloud-based DDoS scrubbing service that can absorb volumetric attacks before they reach your origin. Define a playbook for what to do if a DDoS attack exceeds your protection capacity, including who to contact and what manual rate-limiting measures are available.

    cloudflareaws-shieldfastlyakamai

Evidence required

Capacity monitoring dashboards

Evidence of ongoing capacity monitoring for critical systems with defined alert thresholds.

  • · Datadog or CloudWatch dashboard showing resource utilization with alert thresholds
  • · Capacity report showing current utilization vs. defined limits
  • · Alerting configuration for capacity-related thresholds

Auto-scaling or capacity management configuration

Evidence that capacity management mechanisms are in place for critical services.

  • · AWS Auto Scaling or Kubernetes HPA configuration
  • · Cloudflare or AWS Shield configuration for DDoS protection
  • · Load test results showing system behavior under peak load conditions

Related controls