“The safest infra is the one with no forgotten parts — and nothing unnecessary still running.”
Modern infrastructure often resembles a sprawling city — full of ports, services, APIs, credentials, and control planes. Every open port, daemon, or misconfigured identity role is an opportunity for an attacker.
Infrastructure and Runtime ASR is the practice of deliberately shrinking, isolating, and simplifying both what you deploy and what continues to run, so there are fewer ways in, fewer lateral paths, and fewer moving pieces.
This section combines principles from both infrastructure and runtime attack surface reduction to form a holistic approach to systems that are not just secure on paper, but hardened in operation.
1. Expose Nothing by Default
The principle of “default deny” should apply to everything: network ports, API endpoints, service access, and egress traffic. If it’s not explicitly needed and documented, it should be blocked.
Network Segmentation
The problem: Many infrastructures still operate on a “flat network” model where once you’re in, you can reach everything. This dramatically increases lateral movement opportunities for attackers.
The solution: Implement network segmentation with deny-all defaults:
# Terraform example: Deny-all security group
resource "aws_security_group" "default_deny" {
name = "default-deny"
description = "Default deny all traffic"
vpc_id = aws_vpc.main.id
# No ingress rules - all blocked by default
# Minimal egress for updates only
egress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "HTTPS for package updates only"
}
}
Then explicitly open only what’s needed:
resource "aws_security_group_rule" "web_server_https" {
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [var.load_balancer_cidr]
security_group_id = aws_security_group.web_servers.id
description = "HTTPS from load balancer only"
}
Block Unnecessary Egress
Outbound traffic is often overlooked, but it’s critical for preventing data exfiltration and command-and-control communication.
Best practices:
- Web servers shouldn’t need to connect to random internet IPs
- Database servers shouldn’t need any internet access
- Application servers should only reach known APIs
# iptables example: Whitelist-only egress
iptables -P OUTPUT DROP # Default drop all outbound
iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT # Internal network
iptables -A OUTPUT -d api.trusted-service.com -p tcp --dport 443 -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
Isolation Strategies
Implementation patterns:
- Private subnets for databases and application servers
- VPNs or bastion hosts for administrative access (time-limited)
- Service mesh for microservice communication (mTLS)
- Isolated environments for different trust levels (dev/staging/prod)
Runtime Daemon Audit
Don’t just secure the network — audit what’s listening:
# Linux: Check listening ports
ss -tulpn | grep LISTEN
# Check for unexpected daemons
systemctl list-units --type=service --state=running
# Disable unnecessary services
systemctl disable cups.service # Printing service on a server?
systemctl disable bluetooth.service # Bluetooth on a cloud instance?
Real-World Impact
The Target breach of 2013 started with compromised HVAC vendor credentials. The attackers moved laterally from the HVAC network to the payment system network because proper network segmentation wasn’t in place. 40 million credit cards were stolen.
The The Equifax breach exploited an unpatched Apache Struts vulnerability, but lateral movement was possible because internal segmentation was insufficient.
Further reading:
- NIST SP 800-41: Guidelines on Firewalls and Firewall Policy
- CIS Controls: Network Infrastructure Management
- AWS VPC Security Best Practices
- Zero Trust Architecture - NIST SP 800-207
2. Reduce Control Plane & Runtime Complexity
Every cloud service you enable, every management API you expose, and every runtime agent you deploy increases your attack surface. Complexity is the enemy of security.
The Multi-Cloud Trap
The problem: Organizations often adopt multi-cloud “for resilience” but end up with:
- Multiple IAM systems to secure
- Different security models to understand
- Duplicated services across providers
- Increased operational complexity
The reality: Most multi-cloud setups are multi-cloud by accident, not design. Unless you have a specific architectural reason, stick with one provider and master its security model.
When multi-cloud makes sense:
- Geographic compliance requirements (data residency)
- True disaster recovery across providers
- Leveraging unique capabilities (e.g., AWS Sagemaker + Azure AD)
When it doesn’t:
- “We might want to switch someday”
- “We don’t want vendor lock-in” (you’re trading vendor lock-in for complexity)
- “Different teams prefer different clouds”
Disable Unused Services and APIs
Cloud providers enable many services by default. Each one is a potential misconfiguration or attack vector.
# AWS: Audit enabled services
aws service-quotas list-services | jq -r '.Services[].ServiceName'
# Disable unused AWS services at the organization level
# Example: If you don't use AWS IoT
aws organizations create-policy \
--content '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"iot:*","Resource":"*"}]}' \
--description "Disable IoT services" \
--name "DenyIoT" \
--type SERVICE_CONTROL_POLICY
Audit runtime agents: Many systems ship with agents you don’t need:
- Cloud monitoring agents collecting more than necessary
- APM tools with full system access
- Log shippers with root privileges
- Vulnerability scanners that never run
# Check what's really running
ps aux | grep -E 'agent|daemon' | grep -v grep
# Review installed packages
dpkg -l | grep -E 'agent|monitor' # Debian/Ubuntu
rpm -qa | grep -E 'agent|monitor' # RedHat/CentOS
Avoid Serverless Sprawl
Serverless isn’t automatically more secure. In fact, it often creates:
- Function sprawl (100+ small functions instead of 5 services)
- Permission sprawl (each function needs its own IAM role)
- Debugging complexity (distributed tracing nightmares)
- Cold start security issues (initialization vulnerabilities)
Ask before going serverless:
- Would a simple API server be clearer?
- Can we use 1 function instead of 10?
- Have we audited each function’s IAM permissions?
Runtime Security Hardening
Use modern kernel security features to limit what processes can do:
# Docker Compose example: Hardened container
services:
webapp:
image: myapp:latest
security_opt:
- no-new-privileges:true # Prevent privilege escalation
- seccomp:path/to/seccomp-profile.json # Limit syscalls
cap_drop:
- ALL # Drop all capabilities
cap_add:
- NET_BIND_SERVICE # Only add what's needed
read_only: true # Read-only root filesystem
user: "1000:1000" # Non-root user
tmpfs:
- /tmp # Temporary writable space
AppArmor profile example:
# Restrict what the application can access
profile myapp {
#include <abstractions/base>
/usr/bin/myapp mr,
/etc/myapp/** r,
/var/log/myapp/** w,
# Deny everything else
deny /** wx,
}
Real-World Example
The 2021 Kaseya ransomware attack exploited the company’s VSA remote monitoring tool. This agent had extensive permissions across customer networks. When compromised, it became the perfect lateral movement tool, affecting 1,500+ businesses.
Further reading:
- CIS Docker Benchmark
- Kubernetes Security Best Practices
- Seccomp Profile Guide
- Linux Kernel Security Features
3. Minimalism at the Image and Execution Level
Container images and executables should contain only what’s absolutely necessary to run the application. Everything else is potential attack surface.
Minimal Base Images
The problem: Using full OS images (Ubuntu, CentOS) as container bases includes hundreds of packages you don’t need:
# Size comparison
docker images
ubuntu:latest 470MB
alpine:latest 5.6MB
scratch:latest 0MB (empty!)
The solution: Use minimal base images:
# BAD: Full Ubuntu with kitchen sink
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
python3 python3-pip curl wget git vim \
&& pip3 install -r requirements.txt
# GOOD: Distroless Python
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
FROM gcr.io/distroless/python3-debian11
COPY --from=builder /root/.local /root/.local
COPY app.py /app/
WORKDIR /app
CMD ["app.py"]
Benefits of distroless:
- No shell (prevents
docker execattacks) - No package managers (can’t install tools after compromise)
- Minimal CVE surface (fewer packages = fewer vulnerabilities)
- Smaller attack surface for container escapes
Strip Unnecessary Tools
Even in production, many images include:
curl,wget- used for data exfiltrationbash,sh- used for command executionnc,netcat- used for reverse shells- Package managers (
apt,yum) - used to install attack tools
# Multi-stage build: Keep build tools out of final image
FROM golang:1.21 AS builder
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o app
FROM scratch # Literally empty
COPY --from=builder /src/app /app
ENTRYPOINT ["/app"]
# This image has NOTHING but your binary
Static Binaries
When possible, compile static binaries that need no dependencies:
# Go: Static compilation
CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o myapp
# Rust: Static compilation
cargo build --release --target x86_64-unknown-linux-musl
# Result: Single binary, no dependencies
ldd myapp
# not a dynamically linked executable
Container Runtime Best Practices
# Kubernetes Pod Security
apiVersion: v1
kind: Pod
metadata:
name: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10000
fsGroup: 10000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
Secrets and Configuration
Bad practice: Secrets in environment variables or config files
# DON'T DO THIS
ENV DATABASE_PASSWORD=super_secret_123
Good practice: Runtime secret injection
# Kubernetes: Use secrets and mount them
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
# Better: Use secret management
# AWS Secrets Manager, HashiCorp Vault, etc.
Real-World Impact
The 2020 Docker Hub breach exposed 190,000 user accounts. One contributing factor was that many public images contained embedded credentials and secrets because developers used full OS images with their entire development context.
Research by Sysdig found that 75% of container images have high or critical vulnerabilities, mostly from unnecessary packages in bloated base images.
Further reading:
- Distroless Container Images - Google
- Docker Security Best Practices - OWASP
- CIS Docker Benchmark v1.6
- Container Security Guide - NIST SP 800-190
4. Prune Zombie Infra & Orphaned Services
Every organization has infrastructure graveyards: forgotten EC2 instances, abandoned S3 buckets, orphaned DNS records. Each is a potential entry point.
The Zombie Problem
Common zombies:
- Old EC2 instances someone spun up for “testing”
- S3 buckets from abandoned projects
- RDS databases from previous versions
- Load balancers pointing to deleted instances
- DNS records for decommissioned services
- Container images from 3 years ago
- Staging environments that became “production-like”
Why they’re dangerous:
- Often forgotten in security updates
- May have weak or default credentials
- Might contain old copies of sensitive data
- Create confusion during incident response
- Cost money and create compliance obligations
Discovery and Cleanup
AWS cleanup script example:
#!/bin/bash
# Find instances not tagged or not accessed in 90 days
# Untagged EC2 instances
aws ec2 describe-instances \
--query 'Reservations[].Instances[?!Tags].[InstanceId]' \
--output text
# S3 buckets with no access in 90 days
aws s3api list-buckets --query 'Buckets[].[Name]' --output text | \
while read bucket; do
last_access=$(aws s3api head-bucket --bucket "$bucket" 2>&1)
echo "Bucket: $bucket - Check logs for last access"
done
# Unused EBS volumes
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,CreateTime]' \
--output table
Kubernetes cleanup:
# Find old unused resources
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] | select(.status.phase == "Failed" or .status.phase == "Unknown") |
"\(.metadata.namespace) \(.metadata.name)"'
# Check for old deployments with 0 replicas
kubectl get deployments --all-namespaces -o json | \
jq -r '.items[] | select(.spec.replicas == 0) |
"\(.metadata.namespace) \(.metadata.name) \(.metadata.creationTimestamp)"'
Implement Resource Tagging
Mandatory tags policy:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Deny",
"Action": ["ec2:RunInstances"],
"Resource": ["arn:aws:ec2:*:*:instance/*"],
"Condition": {
"StringNotLike": {
"aws:RequestTag/Owner": "*",
"aws:RequestTag/Project": "*",
"aws:RequestTag/Environment": ["prod", "staging", "dev"]
}
}
}]
}
Automated cleanup based on tags:
# Example: Delete resources tagged for deletion
import boto3
from datetime import datetime, timedelta
ec2 = boto3.client('ec2')
# Find instances tagged "DeleteAfter"
instances = ec2.describe_instances(
Filters=[
{'Name': 'tag-key', 'Values': ['DeleteAfter']}
]
)
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
for tag in instance['Tags']:
if tag['Key'] == 'DeleteAfter':
delete_date = datetime.fromisoformat(tag['Value'])
if datetime.now() > delete_date:
print(f"Terminating {instance['InstanceId']}")
ec2.terminate_instances(InstanceIds=[instance['InstanceId']])
Sidecar and Agent Audit
Many systems accumulate monitoring and logging sidecars:
# Kubernetes: Check sidecar containers
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] |
select(.spec.containers | length > 1) |
"\(.metadata.namespace)/\(.metadata.name): \(.spec.containers | length) containers"'
# Ask: Do we need all these?
# Common cruft: Old metrics exporters, unused log shippers, abandoned debug containers
Real-World Example
The 2019 Capital One breach involved a misconfigured firewall on infrastructure that was part of a migration. The old infrastructure wasn’t fully decommissioned, creating a security gap.
Further reading:
- AWS Well-Architected Framework: Cost Optimization
- Cloud Asset Inventory Management
- Azure Resource Graph for Cleanup
5. Reduce Lateral Movement Paths
Once attackers get initial access, their next goal is lateral movement — compromising additional systems to find valuable data or establish persistence.
Network Segmentation (Deep Dive)
Zero Trust principle: “Never trust, always verify”
Implement micro-segmentation where every service-to-service connection is authenticated and authorized:
# Kubernetes Network Policy: Default deny
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# Then explicitly allow only required connections
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-webapp-to-api
namespace: production
spec:
podSelector:
matchLabels:
app: webapp
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: api
ports:
- protocol: TCP
port: 8080
Block Inter-Service Communication by Default
Service mesh approach (Istio, Linkerd):
# Istio: Require mTLS for all traffic
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT
---
# Authorization: Only API can access database
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: db-access-policy
namespace: production
spec:
selector:
matchLabels:
app: database
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/production/sa/api-service"]
No Wildcard IAM Permissions
Bad IAM policy:
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
Good IAM policy (scoped):
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-specific-bucket/app-uploads/*"
}
Audit for wildcards:
# AWS: Find policies with wildcard actions
aws iam list-policies --scope Local --query 'Policies[*].[PolicyName,Arn]' --output text | \
while read name arn; do
aws iam get-policy-version --policy-arn "$arn" \
--version-id $(aws iam get-policy --policy-arn "$arn" --query 'Policy.DefaultVersionId' --output text) \
--query 'PolicyVersion.Document' | \
grep -q '"Action":\s*"\*"' && echo "Wildcard found in: $name"
done
Secure Build Systems
CI/CD systems often have excessive permissions. They become prime targets because they:
- Have access to production
- Store secrets
- Can deploy code
- Often lack proper monitoring
Secure CI/CD practices:
# GitHub Actions: Minimal permissions
name: Deploy
on: [push]
permissions:
contents: read # Only read code
id-token: write # For OIDC
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Use OIDC instead of static credentials
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
aws-region: us-east-1
role-session-name: GitHubActions
# Deploy with scoped permissions
- name: Deploy
run: ./deploy.sh
Ephemeral CI/CD environments:
- Use short-lived credentials (OIDC tokens)
- Build in isolated, disposable environments
- Don’t persist secrets between builds
- Audit all CI/CD plugins and actions
Real-World Impact
The SolarWinds supply chain attack succeeded because:
- Build environment had excessive network access
- No segmentation between build and distribution systems
- Lateral movement from compromised developer workstation to build servers was possible
The 2021 Codecov breach compromised CI/CD secrets because the Codecov bash uploader script had broad access to environment variables.
Further reading:
- NIST Zero Trust Architecture
- MITRE ATT&CK: Lateral Movement
- CISA: Securing the Software Supply Chain
- SLSA Framework
6. Runtime Cleanup & Reboot Culture
Systems accumulate cruft over time: zombie processes, stale connections, memory leaks, temporary files. Regular cleanup and restarts reduce this accumulation.
Auto-Restart and Self-Healing
Kubernetes: Embrace ephemeral infrastructure
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
Auto-scaling triggers restarts:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Reboot After Patching
Scheduled reboots:
# Systemd timer for monthly reboots
# /etc/systemd/system/monthly-reboot.timer
[Unit]
Description=Monthly reboot for cleanup
[Timer]
OnCalendar=monthly
Persistent=true
[Install]
WantedBy=timers.target
# /etc/systemd/system/monthly-reboot.service
[Unit]
Description=Reboot system
[Service]
Type=oneshot
ExecStart=/sbin/reboot
Cleanup Zombie Processes
Monitor process table:
# Find zombie processes
ps aux | awk '$8=="Z" {print}'
# Clean up orphaned processes
# In systemd
systemctl restart systemd-journald # Restart logging
systemctl daemon-reexec # Reload systemd
# Find long-running processes
ps -eo pid,etime,cmd --sort=-etime | head -20
Disable Debug Flags
Configuration management to ensure production hardening:
# Ansible playbook example
- name: Ensure debug flags are disabled in production
hosts: production
tasks:
- name: Check application config
lineinfile:
path: /etc/myapp/config.yaml
regexp: '^debug:'
line: 'debug: false'
state: present
- name: Ensure verbose logging is off
lineinfile:
path: /etc/myapp/config.yaml
regexp: '^log_level:'
line: 'log_level: warn'
state: present
- name: Restart service
systemd:
name: myapp
state: restarted
Immutable infrastructure:
- Don’t patch running systems
- Build new images with patches
- Deploy new instances
- Terminate old instances
This ensures no accumulated cruft and consistent state.
Real-World Benefit
The Heartbleed vulnerability (2014) affected systems for months because administrators didn’t restart services after patching. The vulnerability was patched, but running processes still had the vulnerable code loaded in memory.
Many compromises persist through memory-resident malware that would be cleared by a simple restart.
Further reading:
7. Infrastructure-as-Code & Deployment Discipline
IaC promises reproducibility and consistency, but poorly written IaC can codify vulnerabilities across your entire infrastructure.
IaC Must Be Explicit
Bad IaC (implicit, unclear):
# What does this actually allow?
resource "aws_security_group" "app" {
name = "app-sg"
ingress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
Good IaC (explicit, documented):
# HTTPS only from load balancer
resource "aws_security_group" "app" {
name = "app-servers"
description = "Allow HTTPS from ALB only"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTPS from ALB security group"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
description = "HTTPS to RDS only"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.rds.id]
}
tags = {
Name = "app-servers"
Environment = "production"
ManagedBy = "terraform"
}
}
No Insecure Defaults
Terraform security scanning:
# Use tfsec to scan for security issues
tfsec .
# Example issues it catches:
# - Public S3 buckets
# - Unencrypted storage
# - Overly permissive security groups
# - Missing logging
Policy as Code:
# OPA policy: Deny public S3 buckets
package terraform.s3
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
resource.acl == "public-read"
msg := sprintf("S3 bucket '%s' has public-read ACL", [name])
}
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
not resource.versioning[_].enabled
msg := sprintf("S3 bucket '%s' does not have versioning enabled", [name])
}
Remove Unused Modules
IaC codebases accumulate unused modules and configurations:
# Find unused Terraform modules
grep -r "module\s*\"" . | awk '{print $2}' | sort | uniq > used_modules.txt
ls -d modules/*/ | awk -F/ '{print $2}' | sort > all_modules.txt
comm -13 used_modules.txt all_modules.txt # Shows unused modules
Only Deploy What You Understand
Pre-deployment checklist:
- Can you explain what this infrastructure does?
- Have you reviewed all security group rules?
- Are all IAM policies scoped to minimum necessary?
- Is logging enabled for audit trails?
- Are secrets stored securely (not hardcoded)?
- Is there a rollback plan?
Automated policy enforcement:
# GitHub Actions: Policy check before apply
name: Terraform
on: [pull_request]
jobs:
policy-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Security Scan
run: tfsec .
- name: Policy Check
uses: open-policy-agent/opa-action@v2
with:
policy: policies/
input: tfplan
Real-World Impact
In 2019, research found that misconfigurations in IaC templates led to 67% of cloud security incidents. Common issues:
- Default admin passwords in templates
- Public exposure set as default
- Missing encryption configurations
- Overly permissive IAM in examples
The 2019 Docker Hub breach was partly enabled by leaked GitHub tokens in IaC repositories.
Further reading:
- Terraform Security Best Practices
- tfsec - Terraform Security Scanner
- Checkov - IaC Security Scanner
- OPA - Open Policy Agent
- NIST SP 800-204: Security for Microservices
8. Guidelines for Infra + Runtime ASR
| Area | ASR Practice | Implementation |
|---|---|---|
| Network | Deny by default, isolate services | Security groups with explicit allowlists; network policies; private subnets |
| Containers | Run minimal images, non-root, limit capabilities | Distroless bases; drop ALL caps; read-only filesystem; seccomp profiles |
| Secrets | Don’t mount unused secrets, rotate often | Secret managers (Vault, AWS Secrets); short-lived credentials; OIDC |
| Agents | Audit or remove sidecars and runtime daemons | Regular review of ps output; remove unused monitoring agents |
| Infra Bloat | Delete what’s stale or shadowed | Tag-based cleanup automation; monthly audits; cost reports |
| Build Infra | Keep CI/CD isolated and short-lived | Ephemeral build environments; scoped credentials; SLSA compliance |
| Runtime State | Restart regularly, audit logs and zombie processes | Immutable infrastructure; automated restarts; process monitoring |
| IaC | Simplify, document, and trim templates regularly | Security scanning (tfsec); policy enforcement (OPA); code review |
9. Final Thought
“Every port closed, every daemon killed, every stale node decommissioned — that’s risk erased.”
True infrastructure and runtime ASR isn’t about reactive monitoring. It’s about intentional architecture:
- Reduce what you expose
- Reduce what keeps running
- Reduce what attackers can reach
- Reduce what they could do if they got in
The ASR infrastructure audit (quarterly):
- Network audit: Can anything talk to anything? Fix it.
- Image audit: Are we running bloated containers? Slim them.
- Zombie audit: What’s running that shouldn’t be? Kill it.
- Permission audit: Who/what has more access than needed? Scope it.
- Agent audit: What’s running on our boxes? Justify or remove.
Start small, but start today. Pick one service, one container, one environment. Apply these principles. Then expand.
The infrastructure you don’t deploy can’t be exploited. The service you don’t run can’t be compromised. The simplicity you achieve is security you gain.