KWOK: Simulate a 1000-Node Kubernetes Cluster on Your Laptop

TL;DR: KWOK is a Kubernetes SIG tool that simulates thousands of fake nodes and pods by manipulating API server objects directly – no real containers, no kubelet, no expensive infrastructure. It can spin up a 1000-node cluster in seconds on a laptop. It’s the most underrated tool in the Kubernetes ecosystem for controller developers, platform engineers, and anyone who needs to test at scale without a cloud bill to match.

There’s a very common problem in the Kubernetes world: you want to test something at scale – how your autoscaler behaves at 500 nodes, how your custom controller handles 10,000 pods, whether your scheduling logic holds up under real load – but spinning up that infrastructure costs a fortune and takes forever. You can’t exactly throw 1000 nodes at a GitHub Actions runner.

KWOK solves this problem elegantly. Instead of running real nodes with real kubelets, it pretends to be the kubelet for every node you create. The API server is happy, the scheduler is happy, your controller is happy – and you spend zero on cloud infrastructure.

In this post, I’ll cover:

What KWOK is and where it came from
How it works under the hood – the simulation engine
kwok vs kwokctl – what each piece does
Installation and a working quick start
Advanced features: Stages, resource usage simulation, CRD support
Real-world use cases: controller testing, CI/CD, autoscaling simulation
Comparison with kind, k3d, minikube, and vcluster
Limitations – what you simply cannot test with KWOK

What is KWOK?

KWOK stands for Kubernetes WithOut Kubelet (pronounced /kwɔk/). It’s a toolkit from the kubernetes-sigs organization that enables you to set up a cluster of thousands of nodes in seconds, simulating the full lifecycle of nodes, pods, and other Kubernetes API objects – without running a single real container.

The project was created by Shiming Zhang (DaoCloud), Wei Huang (Apple), and Yibo Zhuang (Apple). It was officially announced on the Kubernetes blog on March 1, 2023, though the repository had been active since July 2022. Under the hood, KWOK 0.1.0 was a merger of two earlier projects: fake-kubelet and fake-k8s, which had already been battle-tested internally.

It lives at kwok.sigs.k8s.io and is part of the official Kubernetes SIG ecosystem – not a random side project. It’s used in production by the Kubernetes SIG Scalability team for benchmark testing and has been adopted by projects like Karpenter, Apache YuniKorn, and OpenTelemetry.

How It Works: The Simulation Engine

To understand KWOK, you first need to understand what kubelet actually does. In a real Kubernetes cluster, kubelet is the agent that runs on every node. Its job is to:

Register the node with the API server
Watch for pods scheduled to that node
Start containers and report their status back
Send regular heartbeats so the node doesn’t appear NotReady

KWOK replaces all of this with a controller that operates purely on Kubernetes API objects.


graph TD
    subgraph "Real Kubernetes Node"
        A[kubelet] --> B[Container Runtime]
        B --> C[Real Containers]
        A --> D[API Server: node status / pod status]
    end

    subgraph "KWOK Simulated Node"
        E[kwok-controller] --> F[API Server: node status / pod status]
        G[No containers]
    end

    style G fill:#f9f9f9,stroke:#ccc,stroke-dasharray: 5 5

The kwok-controller watches for node objects that it owns (marked with a specific annotation) and takes over all responsibilities kubelet would normally handle:

Node Controller – Updates the node’s status field: conditions (Ready, MemoryPressure, etc.), addresses, capacity, allocatable resources, and node info.
Pod Controller – Simulates pod lifecycle on managed nodes. When the scheduler places a pod on a fake node, KWOK updates its status through the expected phases: Pending → Running → ready.
Node Lease Controller – Creates and renews Lease objects in the kube-node-lease namespace. This is what tells the API server the node is alive – without it, the node would be marked NotReady after ~40 seconds.
Stage Controller – The most powerful component. It defines event-driven lifecycle transitions for any Kubernetes resource type, not just nodes and pods.

The key insight: from the API server’s perspective, a node managed by KWOK is indistinguishable from a real node. The scheduler schedules pods to it, the controller manager tracks it, your custom controllers interact with it – all through the standard Kubernetes API.

Stages: The Event-Driven Lifecycle Engine

Stages are KWOK’s most sophisticated feature. They define how resources transition between states over time.

A stage watches for Kubernetes resource events (Create, Update, Delete) and responds with status patches or deletions, with configurable delays and jitter.

apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
  name: pod-ready
spec:
  resourceRef:
    apiGroup: v1
    kind: Pod
  selector:
    matchExpressions:
      - key: '.metadata.deletionTimestamp'
        operator: DoesNotExist
      - key: '.status.podIP'
        operator: DoesNotExist
  delay:
    durationMilliseconds: 1000
    jitterDurationMilliseconds: 200
  next:
    statusTemplate: |
      {{ `
      conditions:
      - lastTransitionTime: {{ Now }}
        status: "True"
        type: Ready
      phase: Running
      podIP: 10.0.{{ rand 0 255 }}.{{ rand 0 255 }}
      ` }}

This stage fires 1 second (±200ms jitter) after a pod is created on a KWOK node and patches it to Running with a fake pod IP. The jitterDurationMilliseconds is important – it makes the simulation feel realistic by avoiding perfectly synchronized state transitions across thousands of pods.

You can chain stages to simulate complex scenarios:

Pod fails after N seconds, then recovers
Node becomes NotReady briefly, then returns
A pod gets stuck in Terminating for a configurable duration

kwok vs kwokctl: Two Tools, One Ecosystem

KWOK ships as two separate binaries with distinct responsibilities.

`kwok` – The Simulation Engine

kwok is the core controller. It:

Connects to an existing Kubernetes cluster via kubeconfig or in-cluster ServiceAccount
Takes ownership of nodes annotated with kwok.x-k8s.io/node=fake
Runs the simulation logic for those nodes and their pods
Supports configuration via CRDs (Stage, ClusterResourceUsage, etc.)

You would use kwok standalone when you want to add fake nodes to an existing cluster – for example, to simulate scale in a dev cluster that already has some real nodes.

`kwokctl` – The Cluster Lifecycle Manager

kwokctl is the management CLI that provisions entire KWOK clusters from scratch. It:

Spins up a full Kubernetes control plane (etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kwok-controller) using Docker, Podman, or Kind as the container runtime
Creates and deletes clusters with a single command
Manages cluster lifecycle (start, stop, list)
Handles kubeconfig management

Think of kwokctl as analogous to kind or k3d – but instead of a full cluster with real node agents, it gives you a control plane with a KWOK simulation engine attached.

kwokctl → provisions control plane → uses kwok to simulate all worker nodes
k3d     → provisions control plane + k3s agents on real containers
kind    → provisions full control plane + kubelet on Docker nodes

Installation

Homebrew (recommended for macOS/Linux)

brew install kwok
# This installs both kwok and kwokctl

Binary download

KWOK_REPO=kubernetes-sigs/kwok
KWOK_LATEST_RELEASE=$(curl -s "https://api.github.com/repos/${KWOK_REPO}/releases/latest" | jq -r '.tag_name')

# kwokctl
wget -O kwokctl "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwokctl-$(go env GOOS)-$(go env GOARCH)"
chmod +x kwokctl && sudo mv kwokctl /usr/local/bin/

# kwok
wget -O kwok "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwok-$(go env GOOS)-$(go env GOARCH)"
chmod +x kwok && sudo mv kwok /usr/local/bin/

Go

go install sigs.k8s.io/kwok/cmd/kwok@latest
go install sigs.k8s.io/kwok/cmd/kwokctl@latest

Quick Start: From Zero to 1000 Nodes in Under a Minute

Let’s build a real example step by step.

Step 1: Create a KWOK cluster

kwokctl create cluster --name=kwok-demo

This spins up a full Kubernetes control plane in Docker containers. When it’s done, your kubeconfig is automatically updated.

kubectl cluster-info --context kwok-kwok-demo
# Kubernetes control plane is running at https://127.0.0.1:32764

Step 2: Create fake nodes

KWOK doesn’t pre-populate nodes – you create them yourself. This is by design: it lets you control exactly what hardware profile each node presents to the scheduler.

# Create a single fake node
kubectl apply -f - <<EOF
apiVersion: v1
kind: Node
metadata:
  annotations:
    kwok.x-k8s.io/node: fake
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: fake-node-0
    kubernetes.io/os: linux
    node.kubernetes.io/exclude-from-external-load-balancers: ""
  name: fake-node-0
spec:
  taints:
  - effect: NoSchedule
    key: kwok.x-k8s.io/node
    value: fake
EOF

The key is the annotation kwok.x-k8s.io/node: fake – this is how kwok-controller identifies which nodes it should manage.

Step 3: Scale to 100 nodes with a script

for i in $(seq 1 100); do
  kubectl apply -f - <<EOF
apiVersion: v1
kind: Node
metadata:
  annotations:
    kwok.x-k8s.io/node: fake
  labels:
    kubernetes.io/hostname: fake-node-${i}
    kubernetes.io/arch: amd64
    kubernetes.io/os: linux
  name: fake-node-${i}
spec:
  taints:
  - effect: NoSchedule
    key: kwok.x-k8s.io/node
    value: fake
EOF
done

Within seconds, you’ll have 100 Ready nodes:

kubectl get nodes | head -10
# NAME           STATUS   ROLES    AGE   VERSION
# fake-node-1    Ready    <none>   8s    fake
# fake-node-2    Ready    <none>   7s    fake
# fake-node-3    Ready    <none>   7s    fake
# ...

Step 4: Schedule workloads

Pods scheduled to KWOK nodes need a toleration for the kwok.x-k8s.io/node=fake:NoSchedule taint:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fake-workload
spec:
  replicas: 500
  selector:
    matchLabels:
      app: fake-workload
  template:
    metadata:
      labels:
        app: fake-workload
    spec:
      tolerations:
      - key: "kwok.x-k8s.io/node"
        operator: "Exists"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "kwok.x-k8s.io/node"
                operator: "In"
                values:
                - "fake"
      containers:
      - name: app
        image: fake.registry/fake:latest
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
EOF

500 pods will be scheduled and transitioned to Running – virtually instantly, with zero actual container starts.

Step 5: Clean up

kwokctl delete cluster --name=kwok-demo

That’s it. No lingering containers, no orphaned volumes.

Advanced Use Cases

Testing Autoscaling: HPA, VPA, and Karpenter

One of KWOK’s most powerful applications is testing autoscaling logic. To make HPA and VPA work, pods need to report CPU and memory metrics. KWOK supports this via annotations or the ClusterResourceUsage CRD.

Simulating CPU usage via annotations:

apiVersion: v1
kind: Pod
metadata:
  name: high-cpu-pod
  annotations:
    kwok.x-k8s.io/usage-cpu: "850m"
    kwok.x-k8s.io/usage-memory: "256Mi"
spec:
  # ...

Simulating dynamic usage via ClusterResourceUsage:

apiVersion: kwok.x-k8s.io/v1alpha1
kind: ClusterResourceUsage
metadata:
  name: simulate-growing-load
spec:
  target:
    apiGroup: v1
    kind: Pod
  usages:
  - resourceType: cpu
    expression: |
      Quantity("100m") * (pod.SinceSecond() / 60.0)
  - resourceType: memory
    expression: |
      Quantity("64Mi") + Quantity("1Mi") * (pod.SinceSecond() / 30.0)

This expression simulates a pod whose CPU usage grows linearly over time – perfect for testing HPA scale-out triggers without any real workload.

Karpenter integration:

KWOK has official support in the Karpenter project via a kwok provider. You can run the full Karpenter control loop, pointing it at a KWOK cluster, and test node provisioning decisions at scale without touching EC2:

kwokctl create cluster --name=karpenter-test \
  --config=./karpenter-kwok-config.yaml

This is exactly how the Karpenter team validates autoscaling logic in CI.

Controller Development and Testing

If you’re writing a Kubernetes operator or controller, KWOK gives you a realistic environment to test reconciliation logic at scale. You can:

Create 10,000 pods in a specific state and watch your controller process them
Simulate node failures and verify your controller responds correctly
Test finalizer logic and deletion workflows at scale
Validate that your controller doesn’t hammer the API server under load

Simulating a node failure:

Using a custom Stage, you can make a node’s Ready condition flip to False after a delay, simulating a network partition or hardware failure:

apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
  name: node-heartbeat-with-lease
spec:
  resourceRef:
    apiGroup: v1
    kind: Node
  selector:
    matchExpressions:
    - key: '.metadata.annotations["kwok.x-k8s.io/node"]'
      operator: In
      values:
      - fake
  delay:
    durationMilliseconds: 30000
    jitterDurationMilliseconds: 5000
  next:
    statusTemplate: |
      conditions:
      - lastTransitionTime: {{ Now }}
        message: "Simulated node failure"
        reason: KwokSimulation
        status: "False"
        type: Ready

CI/CD Integration

KWOK is excellent for CI pipelines. A complete cluster creation and test run takes seconds rather than minutes, and requires only a Docker daemon – no cloud credentials, no cloud costs.

Example GitHub Actions workflow:

name: Controller Integration Tests
on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4

    - name: Install kwokctl
      run: |
        brew install kwok

    - name: Create KWOK cluster
      run: |
        kwokctl create cluster --name=ci-test
        kubectl wait --for=condition=Ready node --all --timeout=60s

    - name: Deploy simulated nodes
      run: |
        kubectl apply -f ./test/fixtures/nodes.yaml

    - name: Run integration tests
      run: |
        go test ./... -v -tags=integration

    - name: Cleanup
      if: always()
      run: kwokctl delete cluster --name=ci-test

The entire cluster lifecycle fits in a single CI job, with no persistent infrastructure.

KWOK vs. The Alternatives

This is the question I get most often: “when do I use KWOK instead of kind/k3d/minikube?”

The answer is simple: KWOK and these tools solve different problems. They are not fully interchangeable.

Feature	KWOK	kind	k3d	minikube	vcluster
Real containers	No	Yes	Yes	Yes	Yes
Startup time	~1 sec	~30-60 sec	~10-20 sec	~60-90 sec	~30 sec
Resource usage	Minimal	Medium	Low	Medium-High	Low
Max simulated nodes	1000+	~10-20	~10-20	~5	N/A
Pod networking	Simulated	Real	Real	Real	Real
Metrics (HPA/VPA)	Simulated	Real	Real	Real	Real
CRD / controller testing	Excellent	Good	Good	Good	Good
Autoscaling testing	Excellent	Limited	Limited	Limited	Limited
Actual workload testing	No	Yes	Yes	Yes	Yes
Cloud provider simulation	Yes	Limited	Limited	Limited	No


graph LR
    A[What do you need?]
    A --> B{Real containers?}
    B -- Yes --> C{Resource-constrained?}
    B -- No --> D[KWOK]

    C -- Yes --> E[k3d / kind]
    C -- No --> F[minikube]

    D --> G[Scale testing, controller dev,\nautoscaling simulation, CI/CD]
    E --> H[CI/CD with real workloads,\nlocal development]
    F --> I[Learning, local development,\nfull dashboard UX]

Use KWOK when:

You need to simulate 10, 100, or 1000+ nodes
You’re testing controller or operator logic
You’re validating autoscaling triggers (HPA, VPA, Karpenter, Cluster Autoscaler)
You need fast cluster lifecycle in CI/CD
You want to test scheduling decisions and affinity rules at scale

Use kind or k3d when:

You need containers to actually start and run
You’re testing pod networking, CNI plugins, or network policies
You’re testing ingress controllers with real traffic
You’re testing volume mounting and persistence
You need a local development environment for a real application

The tools complement each other well. I personally use k3d for my homelab (as covered in Part 1 of this series) and KWOK for controller testing and CI pipelines.

Limitations: What KWOK Cannot Do

This is crucial to understand before adopting KWOK. The simulation has real limits.

Pods don’t actually run. This is the most important limitation. If you deploy nginx to a KWOK cluster and try to curl its cluster IP, nothing happens. There is no running container, no listening socket, no real process. The API server shows the pod as Running, but that’s it.

This means you cannot test:

Pod networking – no real traffic flows between pods
Volume mounting – PVCs bind (if you configure a storage provisioner), but nothing actually mounts
Container behavior – init containers, sidecar logic, actual application code
Device plugins – GPU allocation, specialized hardware
CNI plugin behavior – network policies are purely theoretical
Service load balancing – no real iptables rules are written, no real traffic is forwarded
Kubelet-specific features – eviction based on actual node pressure, OOM handling, etc.

The divergence risk. Because KWOK simulates kubelet behavior, there may be subtle edge cases where its simulation diverges from how a real kubelet would behave. For critical control-plane logic, always validate on a real cluster before shipping.

No persistent cluster by default. When you kwokctl delete cluster, everything is gone. This is usually what you want, but keep it in mind.

Real-World Adoption

KWOK is not a toy. Here’s who’s using it in production:

Kubernetes SIG Scalability – Uses KWOK for the official Kubernetes scalability benchmark tests
Karpenter – Official kwok provider for testing node autoscaling logic
Apache YuniKorn – Performance evaluation and scalability testing
OpenTelemetry Collector – Performance testing of Kubernetes-related components at scale
Clusterpedia – E2E testing by importing KWOK simulation clusters
NVIDIA (Knavigator) – Virtual node testing for GPU cluster simulation
Headlamp / Aptakube – Load testing their Kubernetes dashboard UIs against large clusters
AWS – Data plane cost modeling with Karpenter and KWOK
IBM – Tutorials on using KWOK for OpenShift simulation

The Karpenter and SIG Scalability integrations are particularly meaningful – these are teams that need to simulate cluster behavior at extreme scale, and they chose KWOK as the foundation.

Summary

KWOK is one of those tools that once you discover it, you wonder how you ever lived without it. The table below summarizes where it fits:

Scenario	KWOK verdict
Testing a Kubernetes controller at 1000 nodes	Perfect fit
Validating HPA/VPA autoscaling triggers	Perfect fit
Testing Karpenter node provisioning decisions	Perfect fit (official integration)
CI/CD: fast cluster spin-up for integration tests	Excellent
Simulating node failures for chaos testing	Good (with custom Stages)
Testing an actual containerized application	Not suitable
Testing pod networking / CNI plugins	Not suitable
Local development of a microservices app	Not suitable

If you work with Kubernetes at scale – whether building operators, platform tooling, or testing autoscaling logic – KWOK deserves a place in your toolkit. It’s fast, lightweight, free, and backed by the Kubernetes SIG organization.

The entire source is at github.com/kubernetes-sigs/kwok, and the documentation at kwok.sigs.k8s.io is excellent. Start with kwokctl create cluster and go from there.

This post is part of my ongoing Homelab Series. For the cluster foundation this tooling builds on, see Part 1: Building a Production-Grade Kubernetes Homelab on macOS.