Environmental and infrastructure protections support system availability
Physical and environmental risks: power failures, hardware failures, natural disasters, and network outages can take systems offline regardless of software controls. For organizations operating their own infrastructure, this means physical protections. For cloud-native organizations, this means architectural choices: redundancy, multi-availability-zone deployments, and leveraging managed services with built-in resilience.
Implementation steps
- 1
Deploy across multiple availability zones
Architect your infrastructure to run across multiple availability zones (AZs) within your primary region. An AZ failure should not cause a service outage. Use load balancers that distribute traffic across AZs. Ensure databases have read replicas or standby instances in a secondary AZ. Document your multi-AZ architecture.
aws google-cloud azure terraform - 2
Implement redundancy for critical components
Identify single points of failure in your architecture and eliminate them. Critical components: load balancers, application servers, databases, cache layers, and message queues should have redundant instances. Use managed services where possible (RDS Multi-AZ, ElastiCache with replication) rather than managing redundancy manually.
aws-rds aws-elasticache aws-elb cloudflare - 3
Test and document recovery from infrastructure failures
Conduct chaos engineering exercises or infrastructure failure simulations to verify that redundancy works as expected. Test AZ failover, database failover, and application server replacement. Document the results and any gaps. Failure testing before an actual failure is far less costly.
aws-fault-injection-simulator chaos-monkey terraform
Evidence required
Multi-AZ architecture documentation
Evidence that infrastructure is deployed with redundancy across failure domains.
- - Architecture diagram showing multi-AZ deployment
- - AWS RDS Multi-AZ configuration
- - Terraform code showing resources spread across availability zones
Infrastructure resilience testing records
Evidence that redundancy has been tested.
- - Failover test results showing successful AZ recovery
- - Chaos engineering exercise report
- - Database failover test documentation