Back to Home

Back to Index

“The safest infra is the one with no forgotten parts — and nothing unnecessary still running.”

Modern infrastructure often resembles a sprawling city — full of ports, services, APIs, credentials, and control planes. Every open port, daemon, or misconfigured identity role is an opportunity for an attacker.

Infrastructure and Runtime ASR is the practice of deliberately shrinking, isolating, and simplifying both what you deploy and what continues to run, so there are fewer ways in, fewer lateral paths, and fewer moving pieces.

This section combines principles from both infrastructure and runtime attack surface reduction to form a holistic approach to systems that are not just secure on paper, but hardened in operation.


1. Expose Nothing by Default

The principle of “default deny” should apply to everything: network ports, API endpoints, service access, and egress traffic. If it’s not explicitly needed and documented, it should be blocked.

Network Segmentation

The problem: Many infrastructures still operate on a “flat network” model where once you’re in, you can reach everything. This dramatically increases lateral movement opportunities for attackers.

The solution: Implement network segmentation with deny-all defaults:

# Terraform example: Deny-all security group
resource "aws_security_group" "default_deny" {
  name        = "default-deny"
  description = "Default deny all traffic"
  vpc_id      = aws_vpc.main.id

  # No ingress rules - all blocked by default
  
  # Minimal egress for updates only
  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS for package updates only"
  }
}

Then explicitly open only what’s needed:

resource "aws_security_group_rule" "web_server_https" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = [var.load_balancer_cidr]
  security_group_id = aws_security_group.web_servers.id
  description       = "HTTPS from load balancer only"
}

Block Unnecessary Egress

Outbound traffic is often overlooked, but it’s critical for preventing data exfiltration and command-and-control communication.

Best practices:

  • Web servers shouldn’t need to connect to random internet IPs
  • Database servers shouldn’t need any internet access
  • Application servers should only reach known APIs
# iptables example: Whitelist-only egress
iptables -P OUTPUT DROP  # Default drop all outbound
iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT  # Internal network
iptables -A OUTPUT -d api.trusted-service.com -p tcp --dport 443 -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

Isolation Strategies

Implementation patterns:

  • Private subnets for databases and application servers
  • VPNs or bastion hosts for administrative access (time-limited)
  • Service mesh for microservice communication (mTLS)
  • Isolated environments for different trust levels (dev/staging/prod)

Runtime Daemon Audit

Don’t just secure the network — audit what’s listening:

# Linux: Check listening ports
ss -tulpn | grep LISTEN

# Check for unexpected daemons
systemctl list-units --type=service --state=running

# Disable unnecessary services
systemctl disable cups.service  # Printing service on a server?
systemctl disable bluetooth.service  # Bluetooth on a cloud instance?

Real-World Impact

The Target breach of 2013 started with compromised HVAC vendor credentials. The attackers moved laterally from the HVAC network to the payment system network because proper network segmentation wasn’t in place. 40 million credit cards were stolen.

The The Equifax breach exploited an unpatched Apache Struts vulnerability, but lateral movement was possible because internal segmentation was insufficient.

Further reading:


2. Reduce Control Plane & Runtime Complexity

Every cloud service you enable, every management API you expose, and every runtime agent you deploy increases your attack surface. Complexity is the enemy of security.

The Multi-Cloud Trap

The problem: Organizations often adopt multi-cloud “for resilience” but end up with:

  • Multiple IAM systems to secure
  • Different security models to understand
  • Duplicated services across providers
  • Increased operational complexity

The reality: Most multi-cloud setups are multi-cloud by accident, not design. Unless you have a specific architectural reason, stick with one provider and master its security model.

When multi-cloud makes sense:

  • Geographic compliance requirements (data residency)
  • True disaster recovery across providers
  • Leveraging unique capabilities (e.g., AWS Sagemaker + Azure AD)

When it doesn’t:

  • “We might want to switch someday”
  • “We don’t want vendor lock-in” (you’re trading vendor lock-in for complexity)
  • “Different teams prefer different clouds”

Disable Unused Services and APIs

Cloud providers enable many services by default. Each one is a potential misconfiguration or attack vector.

# AWS: Audit enabled services
aws service-quotas list-services | jq -r '.Services[].ServiceName'

# Disable unused AWS services at the organization level
# Example: If you don't use AWS IoT
aws organizations create-policy \
  --content '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"iot:*","Resource":"*"}]}' \
  --description "Disable IoT services" \
  --name "DenyIoT" \
  --type SERVICE_CONTROL_POLICY

Audit runtime agents: Many systems ship with agents you don’t need:

  • Cloud monitoring agents collecting more than necessary
  • APM tools with full system access
  • Log shippers with root privileges
  • Vulnerability scanners that never run
# Check what's really running
ps aux | grep -E 'agent|daemon' | grep -v grep

# Review installed packages
dpkg -l | grep -E 'agent|monitor' # Debian/Ubuntu
rpm -qa | grep -E 'agent|monitor' # RedHat/CentOS

Avoid Serverless Sprawl

Serverless isn’t automatically more secure. In fact, it often creates:

  • Function sprawl (100+ small functions instead of 5 services)
  • Permission sprawl (each function needs its own IAM role)
  • Debugging complexity (distributed tracing nightmares)
  • Cold start security issues (initialization vulnerabilities)

Ask before going serverless:

  • Would a simple API server be clearer?
  • Can we use 1 function instead of 10?
  • Have we audited each function’s IAM permissions?

Runtime Security Hardening

Use modern kernel security features to limit what processes can do:

# Docker Compose example: Hardened container
services:
  webapp:
    image: myapp:latest
    security_opt:
      - no-new-privileges:true  # Prevent privilege escalation
      - seccomp:path/to/seccomp-profile.json  # Limit syscalls
    cap_drop:
      - ALL  # Drop all capabilities
    cap_add:
      - NET_BIND_SERVICE  # Only add what's needed
    read_only: true  # Read-only root filesystem
    user: "1000:1000"  # Non-root user
    tmpfs:
      - /tmp  # Temporary writable space

AppArmor profile example:

# Restrict what the application can access
profile myapp {
  #include <abstractions/base>
  
  /usr/bin/myapp mr,
  /etc/myapp/** r,
  /var/log/myapp/** w,
  
  # Deny everything else
  deny /** wx,
}

Real-World Example

The 2021 Kaseya ransomware attack exploited the company’s VSA remote monitoring tool. This agent had extensive permissions across customer networks. When compromised, it became the perfect lateral movement tool, affecting 1,500+ businesses.

Further reading:


3. Minimalism at the Image and Execution Level

Container images and executables should contain only what’s absolutely necessary to run the application. Everything else is potential attack surface.

Minimal Base Images

The problem: Using full OS images (Ubuntu, CentOS) as container bases includes hundreds of packages you don’t need:

# Size comparison
docker images
ubuntu:latest         470MB
alpine:latest         5.6MB
scratch:latest        0MB (empty!)

The solution: Use minimal base images:

# BAD: Full Ubuntu with kitchen sink
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
    python3 python3-pip curl wget git vim \
    && pip3 install -r requirements.txt

# GOOD: Distroless Python
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

FROM gcr.io/distroless/python3-debian11
COPY --from=builder /root/.local /root/.local
COPY app.py /app/
WORKDIR /app
CMD ["app.py"]

Benefits of distroless:

  • No shell (prevents docker exec attacks)
  • No package managers (can’t install tools after compromise)
  • Minimal CVE surface (fewer packages = fewer vulnerabilities)
  • Smaller attack surface for container escapes

Strip Unnecessary Tools

Even in production, many images include:

  • curl, wget - used for data exfiltration
  • bash, sh - used for command execution
  • nc, netcat - used for reverse shells
  • Package managers (apt, yum) - used to install attack tools
# Multi-stage build: Keep build tools out of final image
FROM golang:1.21 AS builder
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o app

FROM scratch  # Literally empty
COPY --from=builder /src/app /app
ENTRYPOINT ["/app"]
# This image has NOTHING but your binary

Static Binaries

When possible, compile static binaries that need no dependencies:

# Go: Static compilation
CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o myapp

# Rust: Static compilation
cargo build --release --target x86_64-unknown-linux-musl

# Result: Single binary, no dependencies
ldd myapp
# not a dynamically linked executable

Container Runtime Best Practices

# Kubernetes Pod Security
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 10000
    fsGroup: 10000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: cache
      mountPath: /app/cache
  volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}

Secrets and Configuration

Bad practice: Secrets in environment variables or config files

# DON'T DO THIS
ENV DATABASE_PASSWORD=super_secret_123

Good practice: Runtime secret injection

# Kubernetes: Use secrets and mount them
env:
- name: DATABASE_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-credentials
      key: password

# Better: Use secret management
# AWS Secrets Manager, HashiCorp Vault, etc.

Real-World Impact

The 2020 Docker Hub breach exposed 190,000 user accounts. One contributing factor was that many public images contained embedded credentials and secrets because developers used full OS images with their entire development context.

Research by Sysdig found that 75% of container images have high or critical vulnerabilities, mostly from unnecessary packages in bloated base images.

Further reading:


4. Prune Zombie Infra & Orphaned Services

Every organization has infrastructure graveyards: forgotten EC2 instances, abandoned S3 buckets, orphaned DNS records. Each is a potential entry point.

The Zombie Problem

Common zombies:

  • Old EC2 instances someone spun up for “testing”
  • S3 buckets from abandoned projects
  • RDS databases from previous versions
  • Load balancers pointing to deleted instances
  • DNS records for decommissioned services
  • Container images from 3 years ago
  • Staging environments that became “production-like”

Why they’re dangerous:

  • Often forgotten in security updates
  • May have weak or default credentials
  • Might contain old copies of sensitive data
  • Create confusion during incident response
  • Cost money and create compliance obligations

Discovery and Cleanup

AWS cleanup script example:

#!/bin/bash
# Find instances not tagged or not accessed in 90 days

# Untagged EC2 instances
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?!Tags].[InstanceId]' \
  --output text

# S3 buckets with no access in 90 days
aws s3api list-buckets --query 'Buckets[].[Name]' --output text | \
while read bucket; do
  last_access=$(aws s3api head-bucket --bucket "$bucket" 2>&1)
  echo "Bucket: $bucket - Check logs for last access"
done

# Unused EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

Kubernetes cleanup:

# Find old unused resources
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.status.phase == "Failed" or .status.phase == "Unknown") | 
  "\(.metadata.namespace) \(.metadata.name)"'

# Check for old deployments with 0 replicas
kubectl get deployments --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.replicas == 0) | 
  "\(.metadata.namespace) \(.metadata.name) \(.metadata.creationTimestamp)"'

Implement Resource Tagging

Mandatory tags policy:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": ["ec2:RunInstances"],
    "Resource": ["arn:aws:ec2:*:*:instance/*"],
    "Condition": {
      "StringNotLike": {
        "aws:RequestTag/Owner": "*",
        "aws:RequestTag/Project": "*",
        "aws:RequestTag/Environment": ["prod", "staging", "dev"]
      }
    }
  }]
}

Automated cleanup based on tags:

# Example: Delete resources tagged for deletion
import boto3
from datetime import datetime, timedelta

ec2 = boto3.client('ec2')

# Find instances tagged "DeleteAfter"
instances = ec2.describe_instances(
    Filters=[
        {'Name': 'tag-key', 'Values': ['DeleteAfter']}
    ]
)

for reservation in instances['Reservations']:
    for instance in reservation['Instances']:
        for tag in instance['Tags']:
            if tag['Key'] == 'DeleteAfter':
                delete_date = datetime.fromisoformat(tag['Value'])
                if datetime.now() > delete_date:
                    print(f"Terminating {instance['InstanceId']}")
                    ec2.terminate_instances(InstanceIds=[instance['InstanceId']])

Sidecar and Agent Audit

Many systems accumulate monitoring and logging sidecars:

# Kubernetes: Check sidecar containers
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | 
  select(.spec.containers | length > 1) | 
  "\(.metadata.namespace)/\(.metadata.name): \(.spec.containers | length) containers"'

# Ask: Do we need all these?
# Common cruft: Old metrics exporters, unused log shippers, abandoned debug containers

Real-World Example

The 2019 Capital One breach involved a misconfigured firewall on infrastructure that was part of a migration. The old infrastructure wasn’t fully decommissioned, creating a security gap.

Further reading:


5. Reduce Lateral Movement Paths

Once attackers get initial access, their next goal is lateral movement — compromising additional systems to find valuable data or establish persistence.

Network Segmentation (Deep Dive)

Zero Trust principle: “Never trust, always verify”

Implement micro-segmentation where every service-to-service connection is authenticated and authorized:

# Kubernetes Network Policy: Default deny
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
# Then explicitly allow only required connections
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-webapp-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: webapp
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 8080

Block Inter-Service Communication by Default

Service mesh approach (Istio, Linkerd):

# Istio: Require mTLS for all traffic
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

---
# Authorization: Only API can access database
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: db-access-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/api-service"]

No Wildcard IAM Permissions

Bad IAM policy:

{
  "Effect": "Allow",
  "Action": "*",
  "Resource": "*"
}

Good IAM policy (scoped):

{
  "Effect": "Allow",
  "Action": [
    "s3:GetObject",
    "s3:PutObject"
  ],
  "Resource": "arn:aws:s3:::my-specific-bucket/app-uploads/*"
}

Audit for wildcards:

# AWS: Find policies with wildcard actions
aws iam list-policies --scope Local --query 'Policies[*].[PolicyName,Arn]' --output text | \
while read name arn; do
  aws iam get-policy-version --policy-arn "$arn" \
    --version-id $(aws iam get-policy --policy-arn "$arn" --query 'Policy.DefaultVersionId' --output text) \
    --query 'PolicyVersion.Document' | \
  grep -q '"Action":\s*"\*"' && echo "Wildcard found in: $name"
done

Secure Build Systems

CI/CD systems often have excessive permissions. They become prime targets because they:

  • Have access to production
  • Store secrets
  • Can deploy code
  • Often lack proper monitoring

Secure CI/CD practices:

# GitHub Actions: Minimal permissions
name: Deploy
on: [push]

permissions:
  contents: read  # Only read code
  id-token: write # For OIDC

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      # Use OIDC instead of static credentials
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          aws-region: us-east-1
          role-session-name: GitHubActions
      
      # Deploy with scoped permissions
      - name: Deploy
        run: ./deploy.sh

Ephemeral CI/CD environments:

  • Use short-lived credentials (OIDC tokens)
  • Build in isolated, disposable environments
  • Don’t persist secrets between builds
  • Audit all CI/CD plugins and actions

Real-World Impact

The SolarWinds supply chain attack succeeded because:

  • Build environment had excessive network access
  • No segmentation between build and distribution systems
  • Lateral movement from compromised developer workstation to build servers was possible

The 2021 Codecov breach compromised CI/CD secrets because the Codecov bash uploader script had broad access to environment variables.

Further reading:


6. Runtime Cleanup & Reboot Culture

Systems accumulate cruft over time: zombie processes, stale connections, memory leaks, temporary files. Regular cleanup and restarts reduce this accumulation.

Auto-Restart and Self-Healing

Kubernetes: Embrace ephemeral infrastructure

apiVersion: apps/v1
kind:  Deployment
metadata:
  name: webapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          periodSeconds: 5

Auto-scaling triggers restarts:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Reboot After Patching

Scheduled reboots:

# Systemd timer for monthly reboots
# /etc/systemd/system/monthly-reboot.timer
[Unit]
Description=Monthly reboot for cleanup

[Timer]
OnCalendar=monthly
Persistent=true

[Install]
WantedBy=timers.target

# /etc/systemd/system/monthly-reboot.service
[Unit]
Description=Reboot system

[Service]
Type=oneshot
ExecStart=/sbin/reboot

Cleanup Zombie Processes

Monitor process table:

# Find zombie processes
ps aux | awk '$8=="Z" {print}'

# Clean up orphaned processes
# In systemd
systemctl restart systemd-journald  # Restart logging
systemctl daemon-reexec  # Reload systemd

# Find long-running processes
ps -eo pid,etime,cmd --sort=-etime | head -20

Disable Debug Flags

Configuration management to ensure production hardening:

# Ansible playbook example
- name: Ensure debug flags are disabled in production
  hosts: production
  tasks:
    - name: Check application config
      lineinfile:
        path: /etc/myapp/config.yaml
        regexp: '^debug:'
        line: 'debug: false'
        state: present
    
    - name: Ensure verbose logging is off
      lineinfile:
        path: /etc/myapp/config.yaml
        regexp: '^log_level:'
        line: 'log_level: warn'
        state: present
    
    - name: Restart service
      systemd:
        name: myapp
        state: restarted

Immutable infrastructure:

  • Don’t patch running systems
  • Build new images with patches
  • Deploy new instances
  • Terminate old instances

This ensures no accumulated cruft and consistent state.

Real-World Benefit

The Heartbleed vulnerability (2014) affected systems for months because administrators didn’t restart services after patching. The vulnerability was patched, but running processes still had the vulnerable code loaded in memory.

Many compromises persist through memory-resident malware that would be cleared by a simple restart.

Further reading:


7. Infrastructure-as-Code & Deployment Discipline

IaC promises reproducibility and consistency, but poorly written IaC can codify vulnerabilities across your entire infrastructure.

IaC Must Be Explicit

Bad IaC (implicit, unclear):

# What does this actually allow?
resource "aws_security_group" "app" {
  name = "app-sg"
  
  ingress {
    from_port   = 0
    to_port     = 65535
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Good IaC (explicit, documented):

# HTTPS only from load balancer
resource "aws_security_group" "app" {
  name        = "app-servers"
  description = "Allow HTTPS from ALB only"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "HTTPS from ALB security group"
    from_port       = 443
    to_port         = 443
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    description = "HTTPS to RDS only"
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    security_groups = [aws_security_group.rds.id]
  }

  tags = {
    Name        = "app-servers"
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

No Insecure Defaults

Terraform security scanning:

# Use tfsec to scan for security issues
tfsec .

# Example issues it catches:
# - Public S3 buckets
# - Unencrypted storage
# - Overly permissive security groups
# - Missing logging

Policy as Code:

# OPA policy: Deny public S3 buckets
package terraform.s3

deny[msg] {
  resource := input.resource.aws_s3_bucket[name]
  resource.acl == "public-read"
  msg := sprintf("S3 bucket '%s' has public-read ACL", [name])
}

deny[msg] {
  resource := input.resource.aws_s3_bucket[name]
  not resource.versioning[_].enabled
  msg := sprintf("S3 bucket '%s' does not have versioning enabled", [name])
}

Remove Unused Modules

IaC codebases accumulate unused modules and configurations:

# Find unused Terraform modules
grep -r "module\s*\"" . | awk '{print $2}' | sort | uniq > used_modules.txt
ls -d modules/*/ | awk -F/ '{print $2}' | sort > all_modules.txt
comm -13 used_modules.txt all_modules.txt  # Shows unused modules

Only Deploy What You Understand

Pre-deployment checklist:

  • Can you explain what this infrastructure does?
  • Have you reviewed all security group rules?
  • Are all IAM policies scoped to minimum necessary?
  • Is logging enabled for audit trails?
  • Are secrets stored securely (not hardcoded)?
  • Is there a rollback plan?

Automated policy enforcement:

# GitHub Actions: Policy check before apply
name: Terraform
on: [pull_request]

jobs:
  policy-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Plan
        run: terraform plan -out=tfplan
      
      - name: Security Scan
        run: tfsec .
      
      - name: Policy Check
        uses: open-policy-agent/opa-action@v2
        with:
          policy: policies/
          input: tfplan

Real-World Impact

In 2019, research found that misconfigurations in IaC templates led to 67% of cloud security incidents. Common issues:

  • Default admin passwords in templates
  • Public exposure set as default
  • Missing encryption configurations
  • Overly permissive IAM in examples

The 2019 Docker Hub breach was partly enabled by leaked GitHub tokens in IaC repositories.

Further reading:


8. Guidelines for Infra + Runtime ASR

Area ASR Practice Implementation
Network Deny by default, isolate services Security groups with explicit allowlists; network policies; private subnets
Containers Run minimal images, non-root, limit capabilities Distroless bases; drop ALL caps; read-only filesystem; seccomp profiles
Secrets Don’t mount unused secrets, rotate often Secret managers (Vault, AWS Secrets); short-lived credentials; OIDC
Agents Audit or remove sidecars and runtime daemons Regular review of ps output; remove unused monitoring agents
Infra Bloat Delete what’s stale or shadowed Tag-based cleanup automation; monthly audits; cost reports
Build Infra Keep CI/CD isolated and short-lived Ephemeral build environments; scoped credentials; SLSA compliance
Runtime State Restart regularly, audit logs and zombie processes Immutable infrastructure; automated restarts; process monitoring
IaC Simplify, document, and trim templates regularly Security scanning (tfsec); policy enforcement (OPA); code review

9. Final Thought

“Every port closed, every daemon killed, every stale node decommissioned — that’s risk erased.”

True infrastructure and runtime ASR isn’t about reactive monitoring. It’s about intentional architecture:

  • Reduce what you expose
  • Reduce what keeps running
  • Reduce what attackers can reach
  • Reduce what they could do if they got in

The ASR infrastructure audit (quarterly):

  1. Network audit: Can anything talk to anything? Fix it.
  2. Image audit: Are we running bloated containers? Slim them.
  3. Zombie audit: What’s running that shouldn’t be? Kill it.
  4. Permission audit: Who/what has more access than needed? Scope it.
  5. Agent audit: What’s running on our boxes? Justify or remove.

Start small, but start today. Pick one service, one container, one environment. Apply these principles. Then expand.

The infrastructure you don’t deploy can’t be exploited. The service you don’t run can’t be compromised. The simplicity you achieve is security you gain.