KWOK: Simulate a 1000-Node Kubernetes Cluster on Your Laptop

KWOK: Simulate a 1000-Node Kubernetes Cluster on Your Laptop

May 3, 2026

TL;DR: KWOK is a Kubernetes SIG tool that simulates thousands of fake nodes and pods by manipulating API server objects directly – no real containers, no kubelet, no expensive infrastructure. It can spin up a 1000-node cluster in seconds on a laptop. It’s the most underrated tool in the Kubernetes ecosystem for controller developers, platform engineers, and anyone who needs to test at scale without a cloud bill to match.


There’s a very common problem in the Kubernetes world: you want to test something at scale – how your autoscaler behaves at 500 nodes, how your custom controller handles 10,000 pods, whether your scheduling logic holds up under real load – but spinning up that infrastructure costs a fortune and takes forever. You can’t exactly throw 1000 nodes at a GitHub Actions runner.

KWOK solves this problem elegantly. Instead of running real nodes with real kubelets, it pretends to be the kubelet for every node you create. The API server is happy, the scheduler is happy, your controller is happy – and you spend zero on cloud infrastructure.

In this post, I’ll cover:

  • What KWOK is and where it came from
  • How it works under the hood – the simulation engine
  • kwok vs kwokctl – what each piece does
  • Installation and a working quick start
  • Advanced features: Stages, resource usage simulation, CRD support
  • Real-world use cases: controller testing, CI/CD, autoscaling simulation
  • Comparison with kind, k3d, minikube, and vcluster
  • Limitations – what you simply cannot test with KWOK

What is KWOK?

KWOK stands for Kubernetes WithOut Kubelet (pronounced /kwɔk/). It’s a toolkit from the kubernetes-sigs organization that enables you to set up a cluster of thousands of nodes in seconds, simulating the full lifecycle of nodes, pods, and other Kubernetes API objects – without running a single real container.

The project was created by Shiming Zhang (DaoCloud), Wei Huang (Apple), and Yibo Zhuang (Apple). It was officially announced on the Kubernetes blog on March 1, 2023, though the repository had been active since July 2022. Under the hood, KWOK 0.1.0 was a merger of two earlier projects: fake-kubelet and fake-k8s, which had already been battle-tested internally.

It lives at kwok.sigs.k8s.io and is part of the official Kubernetes SIG ecosystem – not a random side project. It’s used in production by the Kubernetes SIG Scalability team for benchmark testing and has been adopted by projects like Karpenter, Apache YuniKorn, and OpenTelemetry.

How It Works: The Simulation Engine

To understand KWOK, you first need to understand what kubelet actually does. In a real Kubernetes cluster, kubelet is the agent that runs on every node. Its job is to:

  1. Register the node with the API server
  2. Watch for pods scheduled to that node
  3. Start containers and report their status back
  4. Send regular heartbeats so the node doesn’t appear NotReady

KWOK replaces all of this with a controller that operates purely on Kubernetes API objects.


graph TD
    subgraph "Real Kubernetes Node"
        A[kubelet] --> B[Container Runtime]
        B --> C[Real Containers]
        A --> D[API Server: node status / pod status]
    end

    subgraph "KWOK Simulated Node"
        E[kwok-controller] --> F[API Server: node status / pod status]
        G[No containers]
    end

    style G fill:#f9f9f9,stroke:#ccc,stroke-dasharray: 5 5

The kwok-controller watches for node objects that it owns (marked with a specific annotation) and takes over all responsibilities kubelet would normally handle:

  • Node Controller – Updates the node’s status field: conditions (Ready, MemoryPressure, etc.), addresses, capacity, allocatable resources, and node info.
  • Pod Controller – Simulates pod lifecycle on managed nodes. When the scheduler places a pod on a fake node, KWOK updates its status through the expected phases: PendingRunning → ready.
  • Node Lease Controller – Creates and renews Lease objects in the kube-node-lease namespace. This is what tells the API server the node is alive – without it, the node would be marked NotReady after ~40 seconds.
  • Stage Controller – The most powerful component. It defines event-driven lifecycle transitions for any Kubernetes resource type, not just nodes and pods.

The key insight: from the API server’s perspective, a node managed by KWOK is indistinguishable from a real node. The scheduler schedules pods to it, the controller manager tracks it, your custom controllers interact with it – all through the standard Kubernetes API.

Stages: The Event-Driven Lifecycle Engine

Stages are KWOK’s most sophisticated feature. They define how resources transition between states over time.

A stage watches for Kubernetes resource events (Create, Update, Delete) and responds with status patches or deletions, with configurable delays and jitter.

apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
  name: pod-ready
spec:
  resourceRef:
    apiGroup: v1
    kind: Pod
  selector:
    matchExpressions:
      - key: '.metadata.deletionTimestamp'
        operator: DoesNotExist
      - key: '.status.podIP'
        operator: DoesNotExist
  delay:
    durationMilliseconds: 1000
    jitterDurationMilliseconds: 200
  next:
    statusTemplate: |
      {{ `
      conditions:
      - lastTransitionTime: {{ Now }}
        status: "True"
        type: Ready
      phase: Running
      podIP: 10.0.{{ rand 0 255 }}.{{ rand 0 255 }}
      ` }}

This stage fires 1 second (±200ms jitter) after a pod is created on a KWOK node and patches it to Running with a fake pod IP. The jitterDurationMilliseconds is important – it makes the simulation feel realistic by avoiding perfectly synchronized state transitions across thousands of pods.

You can chain stages to simulate complex scenarios:

  • Pod fails after N seconds, then recovers
  • Node becomes NotReady briefly, then returns
  • A pod gets stuck in Terminating for a configurable duration

kwok vs kwokctl: Two Tools, One Ecosystem

KWOK ships as two separate binaries with distinct responsibilities.

kwok – The Simulation Engine

kwok is the core controller. It:

  • Connects to an existing Kubernetes cluster via kubeconfig or in-cluster ServiceAccount
  • Takes ownership of nodes annotated with kwok.x-k8s.io/node=fake
  • Runs the simulation logic for those nodes and their pods
  • Supports configuration via CRDs (Stage, ClusterResourceUsage, etc.)

You would use kwok standalone when you want to add fake nodes to an existing cluster – for example, to simulate scale in a dev cluster that already has some real nodes.

kwokctl – The Cluster Lifecycle Manager

kwokctl is the management CLI that provisions entire KWOK clusters from scratch. It:

  • Spins up a full Kubernetes control plane (etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kwok-controller) using Docker, Podman, or Kind as the container runtime
  • Creates and deletes clusters with a single command
  • Manages cluster lifecycle (start, stop, list)
  • Handles kubeconfig management

Think of kwokctl as analogous to kind or k3d – but instead of a full cluster with real node agents, it gives you a control plane with a KWOK simulation engine attached.

kwokctl → provisions control plane → uses kwok to simulate all worker nodes
k3d     → provisions control plane + k3s agents on real containers
kind    → provisions full control plane + kubelet on Docker nodes

Installation

brew install kwok
# This installs both kwok and kwokctl

Binary download

KWOK_REPO=kubernetes-sigs/kwok
KWOK_LATEST_RELEASE=$(curl -s "https://api.github.com/repos/${KWOK_REPO}/releases/latest" | jq -r '.tag_name')

# kwokctl
wget -O kwokctl "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwokctl-$(go env GOOS)-$(go env GOARCH)"
chmod +x kwokctl && sudo mv kwokctl /usr/local/bin/

# kwok
wget -O kwok "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwok-$(go env GOOS)-$(go env GOARCH)"
chmod +x kwok && sudo mv kwok /usr/local/bin/

Go

go install sigs.k8s.io/kwok/cmd/kwok@latest
go install sigs.k8s.io/kwok/cmd/kwokctl@latest

Quick Start: From Zero to 1000 Nodes in Under a Minute

Let’s build a real example step by step.

Step 1: Create a KWOK cluster

kwokctl create cluster --name=kwok-demo

This spins up a full Kubernetes control plane in Docker containers. When it’s done, your kubeconfig is automatically updated.

kubectl cluster-info --context kwok-kwok-demo
# Kubernetes control plane is running at https://127.0.0.1:32764

Step 2: Create fake nodes

KWOK doesn’t pre-populate nodes – you create them yourself. This is by design: it lets you control exactly what hardware profile each node presents to the scheduler.

# Create a single fake node
kubectl apply -f - <<EOF
apiVersion: v1
kind: Node
metadata:
  annotations:
    kwok.x-k8s.io/node: fake
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: fake-node-0
    kubernetes.io/os: linux
    node.kubernetes.io/exclude-from-external-load-balancers: ""
  name: fake-node-0
spec:
  taints:
  - effect: NoSchedule
    key: kwok.x-k8s.io/node
    value: fake
EOF

The key is the annotation kwok.x-k8s.io/node: fake – this is how kwok-controller identifies which nodes it should manage.

Step 3: Scale to 100 nodes with a script

for i in $(seq 1 100); do
  kubectl apply -f - <<EOF
apiVersion: v1
kind: Node
metadata:
  annotations:
    kwok.x-k8s.io/node: fake
  labels:
    kubernetes.io/hostname: fake-node-${i}
    kubernetes.io/arch: amd64
    kubernetes.io/os: linux
  name: fake-node-${i}
spec:
  taints:
  - effect: NoSchedule
    key: kwok.x-k8s.io/node
    value: fake
EOF
done

Within seconds, you’ll have 100 Ready nodes:

kubectl get nodes | head -10
# NAME           STATUS   ROLES    AGE   VERSION
# fake-node-1    Ready    <none>   8s    fake
# fake-node-2    Ready    <none>   7s    fake
# fake-node-3    Ready    <none>   7s    fake
# ...

Step 4: Schedule workloads

Pods scheduled to KWOK nodes need a toleration for the kwok.x-k8s.io/node=fake:NoSchedule taint:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fake-workload
spec:
  replicas: 500
  selector:
    matchLabels:
      app: fake-workload
  template:
    metadata:
      labels:
        app: fake-workload
    spec:
      tolerations:
      - key: "kwok.x-k8s.io/node"
        operator: "Exists"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "kwok.x-k8s.io/node"
                operator: "In"
                values:
                - "fake"
      containers:
      - name: app
        image: fake.registry/fake:latest
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
EOF

500 pods will be scheduled and transitioned to Running – virtually instantly, with zero actual container starts.

Step 5: Clean up

kwokctl delete cluster --name=kwok-demo

That’s it. No lingering containers, no orphaned volumes.

Advanced Use Cases

Testing Autoscaling: HPA, VPA, and Karpenter

One of KWOK’s most powerful applications is testing autoscaling logic. To make HPA and VPA work, pods need to report CPU and memory metrics. KWOK supports this via annotations or the ClusterResourceUsage CRD.

Simulating CPU usage via annotations:

apiVersion: v1
kind: Pod
metadata:
  name: high-cpu-pod
  annotations:
    kwok.x-k8s.io/usage-cpu: "850m"
    kwok.x-k8s.io/usage-memory: "256Mi"
spec:
  # ...

Simulating dynamic usage via ClusterResourceUsage:

apiVersion: kwok.x-k8s.io/v1alpha1
kind: ClusterResourceUsage
metadata:
  name: simulate-growing-load
spec:
  target:
    apiGroup: v1
    kind: Pod
  usages:
  - resourceType: cpu
    expression: |
      Quantity("100m") * (pod.SinceSecond() / 60.0)
  - resourceType: memory
    expression: |
      Quantity("64Mi") + Quantity("1Mi") * (pod.SinceSecond() / 30.0)

This expression simulates a pod whose CPU usage grows linearly over time – perfect for testing HPA scale-out triggers without any real workload.

Karpenter integration:

KWOK has official support in the Karpenter project via a kwok provider. You can run the full Karpenter control loop, pointing it at a KWOK cluster, and test node provisioning decisions at scale without touching EC2:

kwokctl create cluster --name=karpenter-test \
  --config=./karpenter-kwok-config.yaml

This is exactly how the Karpenter team validates autoscaling logic in CI.

Controller Development and Testing

If you’re writing a Kubernetes operator or controller, KWOK gives you a realistic environment to test reconciliation logic at scale. You can:

  • Create 10,000 pods in a specific state and watch your controller process them
  • Simulate node failures and verify your controller responds correctly
  • Test finalizer logic and deletion workflows at scale
  • Validate that your controller doesn’t hammer the API server under load

Simulating a node failure:

Using a custom Stage, you can make a node’s Ready condition flip to False after a delay, simulating a network partition or hardware failure:

apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
  name: node-heartbeat-with-lease
spec:
  resourceRef:
    apiGroup: v1
    kind: Node
  selector:
    matchExpressions:
    - key: '.metadata.annotations["kwok.x-k8s.io/node"]'
      operator: In
      values:
      - fake
  delay:
    durationMilliseconds: 30000
    jitterDurationMilliseconds: 5000
  next:
    statusTemplate: |
      conditions:
      - lastTransitionTime: {{ Now }}
        message: "Simulated node failure"
        reason: KwokSimulation
        status: "False"
        type: Ready

CI/CD Integration

KWOK is excellent for CI pipelines. A complete cluster creation and test run takes seconds rather than minutes, and requires only a Docker daemon – no cloud credentials, no cloud costs.

Example GitHub Actions workflow:

name: Controller Integration Tests
on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4

    - name: Install kwokctl
      run: |
        brew install kwok

    - name: Create KWOK cluster
      run: |
        kwokctl create cluster --name=ci-test
        kubectl wait --for=condition=Ready node --all --timeout=60s

    - name: Deploy simulated nodes
      run: |
        kubectl apply -f ./test/fixtures/nodes.yaml

    - name: Run integration tests
      run: |
        go test ./... -v -tags=integration

    - name: Cleanup
      if: always()
      run: kwokctl delete cluster --name=ci-test

The entire cluster lifecycle fits in a single CI job, with no persistent infrastructure.

KWOK vs. The Alternatives

This is the question I get most often: “when do I use KWOK instead of kind/k3d/minikube?”

The answer is simple: KWOK and these tools solve different problems. They are not fully interchangeable.

FeatureKWOKkindk3dminikubevcluster
Real containersNoYesYesYesYes
Startup time~1 sec~30-60 sec~10-20 sec~60-90 sec~30 sec
Resource usageMinimalMediumLowMedium-HighLow
Max simulated nodes1000+~10-20~10-20~5N/A
Pod networkingSimulatedRealRealRealReal
Metrics (HPA/VPA)SimulatedRealRealRealReal
CRD / controller testingExcellentGoodGoodGoodGood
Autoscaling testingExcellentLimitedLimitedLimitedLimited
Actual workload testingNoYesYesYesYes
Cloud provider simulationYesLimitedLimitedLimitedNo

graph LR
    A[What do you need?]
    A --> B{Real containers?}
    B -- Yes --> C{Resource-constrained?}
    B -- No --> D[KWOK]

    C -- Yes --> E[k3d / kind]
    C -- No --> F[minikube]

    D --> G[Scale testing, controller dev,\nautoscaling simulation, CI/CD]
    E --> H[CI/CD with real workloads,\nlocal development]
    F --> I[Learning, local development,\nfull dashboard UX]

Use KWOK when:

  • You need to simulate 10, 100, or 1000+ nodes
  • You’re testing controller or operator logic
  • You’re validating autoscaling triggers (HPA, VPA, Karpenter, Cluster Autoscaler)
  • You need fast cluster lifecycle in CI/CD
  • You want to test scheduling decisions and affinity rules at scale

Use kind or k3d when:

  • You need containers to actually start and run
  • You’re testing pod networking, CNI plugins, or network policies
  • You’re testing ingress controllers with real traffic
  • You’re testing volume mounting and persistence
  • You need a local development environment for a real application

The tools complement each other well. I personally use k3d for my homelab (as covered in Part 1 of this series) and KWOK for controller testing and CI pipelines.

Limitations: What KWOK Cannot Do

This is crucial to understand before adopting KWOK. The simulation has real limits.

Pods don’t actually run. This is the most important limitation. If you deploy nginx to a KWOK cluster and try to curl its cluster IP, nothing happens. There is no running container, no listening socket, no real process. The API server shows the pod as Running, but that’s it.

This means you cannot test:

  • Pod networking – no real traffic flows between pods
  • Volume mounting – PVCs bind (if you configure a storage provisioner), but nothing actually mounts
  • Container behavior – init containers, sidecar logic, actual application code
  • Device plugins – GPU allocation, specialized hardware
  • CNI plugin behavior – network policies are purely theoretical
  • Service load balancing – no real iptables rules are written, no real traffic is forwarded
  • Kubelet-specific features – eviction based on actual node pressure, OOM handling, etc.

The divergence risk. Because KWOK simulates kubelet behavior, there may be subtle edge cases where its simulation diverges from how a real kubelet would behave. For critical control-plane logic, always validate on a real cluster before shipping.

No persistent cluster by default. When you kwokctl delete cluster, everything is gone. This is usually what you want, but keep it in mind.

Real-World Adoption

KWOK is not a toy. Here’s who’s using it in production:

  • Kubernetes SIG Scalability – Uses KWOK for the official Kubernetes scalability benchmark tests
  • Karpenter – Official kwok provider for testing node autoscaling logic
  • Apache YuniKorn – Performance evaluation and scalability testing
  • OpenTelemetry Collector – Performance testing of Kubernetes-related components at scale
  • Clusterpedia – E2E testing by importing KWOK simulation clusters
  • NVIDIA (Knavigator) – Virtual node testing for GPU cluster simulation
  • Headlamp / Aptakube – Load testing their Kubernetes dashboard UIs against large clusters
  • AWSData plane cost modeling with Karpenter and KWOK
  • IBM – Tutorials on using KWOK for OpenShift simulation

The Karpenter and SIG Scalability integrations are particularly meaningful – these are teams that need to simulate cluster behavior at extreme scale, and they chose KWOK as the foundation.

Summary

KWOK is one of those tools that once you discover it, you wonder how you ever lived without it. The table below summarizes where it fits:

ScenarioKWOK verdict
Testing a Kubernetes controller at 1000 nodesPerfect fit
Validating HPA/VPA autoscaling triggersPerfect fit
Testing Karpenter node provisioning decisionsPerfect fit (official integration)
CI/CD: fast cluster spin-up for integration testsExcellent
Simulating node failures for chaos testingGood (with custom Stages)
Testing an actual containerized applicationNot suitable
Testing pod networking / CNI pluginsNot suitable
Local development of a microservices appNot suitable

If you work with Kubernetes at scale – whether building operators, platform tooling, or testing autoscaling logic – KWOK deserves a place in your toolkit. It’s fast, lightweight, free, and backed by the Kubernetes SIG organization.

The entire source is at github.com/kubernetes-sigs/kwok, and the documentation at kwok.sigs.k8s.io is excellent. Start with kwokctl create cluster and go from there.


This post is part of my ongoing Homelab Series. For the cluster foundation this tooling builds on, see Part 1: Building a Production-Grade Kubernetes Homelab on macOS.