
KWOK: Simulate a 1000-Node Kubernetes Cluster on Your Laptop
May 3, 2026
TL;DR: KWOK is a Kubernetes SIG tool that simulates thousands of fake nodes and pods by manipulating API server objects directly – no real containers, no kubelet, no expensive infrastructure. It can spin up a 1000-node cluster in seconds on a laptop. It’s the most underrated tool in the Kubernetes ecosystem for controller developers, platform engineers, and anyone who needs to test at scale without a cloud bill to match.
There’s a very common problem in the Kubernetes world: you want to test something at scale – how your autoscaler behaves at 500 nodes, how your custom controller handles 10,000 pods, whether your scheduling logic holds up under real load – but spinning up that infrastructure costs a fortune and takes forever. You can’t exactly throw 1000 nodes at a GitHub Actions runner.
KWOK solves this problem elegantly. Instead of running real nodes with real kubelets, it pretends to be the kubelet for every node you create. The API server is happy, the scheduler is happy, your controller is happy – and you spend zero on cloud infrastructure.
In this post, I’ll cover:
- What KWOK is and where it came from
- How it works under the hood – the simulation engine
- kwok vs kwokctl – what each piece does
- Installation and a working quick start
- Advanced features: Stages, resource usage simulation, CRD support
- Real-world use cases: controller testing, CI/CD, autoscaling simulation
- Comparison with kind, k3d, minikube, and vcluster
- Limitations – what you simply cannot test with KWOK
What is KWOK?
KWOK stands for Kubernetes WithOut Kubelet (pronounced /kwɔk/). It’s a toolkit from the kubernetes-sigs organization that enables you to set up a cluster of thousands of nodes in seconds, simulating the full lifecycle of nodes, pods, and other Kubernetes API objects – without running a single real container.
The project was created by Shiming Zhang (DaoCloud), Wei Huang (Apple), and Yibo Zhuang (Apple). It was officially announced on the Kubernetes blog on March 1, 2023, though the repository had been active since July 2022. Under the hood, KWOK 0.1.0 was a merger of two earlier projects: fake-kubelet and fake-k8s, which had already been battle-tested internally.
It lives at kwok.sigs.k8s.io and is part of the official Kubernetes SIG ecosystem – not a random side project. It’s used in production by the Kubernetes SIG Scalability team for benchmark testing and has been adopted by projects like Karpenter, Apache YuniKorn, and OpenTelemetry.
How It Works: The Simulation Engine
To understand KWOK, you first need to understand what kubelet actually does. In a real Kubernetes cluster, kubelet is the agent that runs on every node. Its job is to:
- Register the node with the API server
- Watch for pods scheduled to that node
- Start containers and report their status back
- Send regular heartbeats so the node doesn’t appear
NotReady
KWOK replaces all of this with a controller that operates purely on Kubernetes API objects.
graph TD
subgraph "Real Kubernetes Node"
A[kubelet] --> B[Container Runtime]
B --> C[Real Containers]
A --> D[API Server: node status / pod status]
end
subgraph "KWOK Simulated Node"
E[kwok-controller] --> F[API Server: node status / pod status]
G[No containers]
end
style G fill:#f9f9f9,stroke:#ccc,stroke-dasharray: 5 5
The kwok-controller watches for node objects that it owns (marked with a specific annotation) and takes over all responsibilities kubelet would normally handle:
- Node Controller – Updates the node’s
statusfield: conditions (Ready,MemoryPressure, etc.), addresses, capacity, allocatable resources, and node info. - Pod Controller – Simulates pod lifecycle on managed nodes. When the scheduler places a pod on a fake node, KWOK updates its status through the expected phases:
Pending→Running→ ready. - Node Lease Controller – Creates and renews
Leaseobjects in thekube-node-leasenamespace. This is what tells the API server the node is alive – without it, the node would be markedNotReadyafter ~40 seconds. - Stage Controller – The most powerful component. It defines event-driven lifecycle transitions for any Kubernetes resource type, not just nodes and pods.
The key insight: from the API server’s perspective, a node managed by KWOK is indistinguishable from a real node. The scheduler schedules pods to it, the controller manager tracks it, your custom controllers interact with it – all through the standard Kubernetes API.
Stages: The Event-Driven Lifecycle Engine
Stages are KWOK’s most sophisticated feature. They define how resources transition between states over time.
A stage watches for Kubernetes resource events (Create, Update, Delete) and responds with status patches or deletions, with configurable delays and jitter.
apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
name: pod-ready
spec:
resourceRef:
apiGroup: v1
kind: Pod
selector:
matchExpressions:
- key: '.metadata.deletionTimestamp'
operator: DoesNotExist
- key: '.status.podIP'
operator: DoesNotExist
delay:
durationMilliseconds: 1000
jitterDurationMilliseconds: 200
next:
statusTemplate: |
{{ `
conditions:
- lastTransitionTime: {{ Now }}
status: "True"
type: Ready
phase: Running
podIP: 10.0.{{ rand 0 255 }}.{{ rand 0 255 }}
` }}
This stage fires 1 second (±200ms jitter) after a pod is created on a KWOK node and patches it to Running with a fake pod IP. The jitterDurationMilliseconds is important – it makes the simulation feel realistic by avoiding perfectly synchronized state transitions across thousands of pods.
You can chain stages to simulate complex scenarios:
- Pod fails after N seconds, then recovers
- Node becomes
NotReadybriefly, then returns - A pod gets stuck in
Terminatingfor a configurable duration
kwok vs kwokctl: Two Tools, One Ecosystem
KWOK ships as two separate binaries with distinct responsibilities.
kwok – The Simulation Engine
kwok is the core controller. It:
- Connects to an existing Kubernetes cluster via kubeconfig or in-cluster ServiceAccount
- Takes ownership of nodes annotated with
kwok.x-k8s.io/node=fake - Runs the simulation logic for those nodes and their pods
- Supports configuration via CRDs (
Stage,ClusterResourceUsage, etc.)
You would use kwok standalone when you want to add fake nodes to an existing cluster – for example, to simulate scale in a dev cluster that already has some real nodes.
kwokctl – The Cluster Lifecycle Manager
kwokctl is the management CLI that provisions entire KWOK clusters from scratch. It:
- Spins up a full Kubernetes control plane (etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kwok-controller) using Docker, Podman, or Kind as the container runtime
- Creates and deletes clusters with a single command
- Manages cluster lifecycle (start, stop, list)
- Handles kubeconfig management
Think of kwokctl as analogous to kind or k3d – but instead of a full cluster with real node agents, it gives you a control plane with a KWOK simulation engine attached.
kwokctl → provisions control plane → uses kwok to simulate all worker nodes
k3d → provisions control plane + k3s agents on real containers
kind → provisions full control plane + kubelet on Docker nodes
Installation
Homebrew (recommended for macOS/Linux)
brew install kwok
# This installs both kwok and kwokctl
Binary download
KWOK_REPO=kubernetes-sigs/kwok
KWOK_LATEST_RELEASE=$(curl -s "https://api.github.com/repos/${KWOK_REPO}/releases/latest" | jq -r '.tag_name')
# kwokctl
wget -O kwokctl "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwokctl-$(go env GOOS)-$(go env GOARCH)"
chmod +x kwokctl && sudo mv kwokctl /usr/local/bin/
# kwok
wget -O kwok "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwok-$(go env GOOS)-$(go env GOARCH)"
chmod +x kwok && sudo mv kwok /usr/local/bin/
Go
go install sigs.k8s.io/kwok/cmd/kwok@latest
go install sigs.k8s.io/kwok/cmd/kwokctl@latest
Quick Start: From Zero to 1000 Nodes in Under a Minute
Let’s build a real example step by step.
Step 1: Create a KWOK cluster
kwokctl create cluster --name=kwok-demo
This spins up a full Kubernetes control plane in Docker containers. When it’s done, your kubeconfig is automatically updated.
kubectl cluster-info --context kwok-kwok-demo
# Kubernetes control plane is running at https://127.0.0.1:32764
Step 2: Create fake nodes
KWOK doesn’t pre-populate nodes – you create them yourself. This is by design: it lets you control exactly what hardware profile each node presents to the scheduler.
# Create a single fake node
kubectl apply -f - <<EOF
apiVersion: v1
kind: Node
metadata:
annotations:
kwok.x-k8s.io/node: fake
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: fake-node-0
kubernetes.io/os: linux
node.kubernetes.io/exclude-from-external-load-balancers: ""
name: fake-node-0
spec:
taints:
- effect: NoSchedule
key: kwok.x-k8s.io/node
value: fake
EOF
The key is the annotation kwok.x-k8s.io/node: fake – this is how kwok-controller identifies which nodes it should manage.
Step 3: Scale to 100 nodes with a script
for i in $(seq 1 100); do
kubectl apply -f - <<EOF
apiVersion: v1
kind: Node
metadata:
annotations:
kwok.x-k8s.io/node: fake
labels:
kubernetes.io/hostname: fake-node-${i}
kubernetes.io/arch: amd64
kubernetes.io/os: linux
name: fake-node-${i}
spec:
taints:
- effect: NoSchedule
key: kwok.x-k8s.io/node
value: fake
EOF
done
Within seconds, you’ll have 100 Ready nodes:
kubectl get nodes | head -10
# NAME STATUS ROLES AGE VERSION
# fake-node-1 Ready <none> 8s fake
# fake-node-2 Ready <none> 7s fake
# fake-node-3 Ready <none> 7s fake
# ...
Step 4: Schedule workloads
Pods scheduled to KWOK nodes need a toleration for the kwok.x-k8s.io/node=fake:NoSchedule taint:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: fake-workload
spec:
replicas: 500
selector:
matchLabels:
app: fake-workload
template:
metadata:
labels:
app: fake-workload
spec:
tolerations:
- key: "kwok.x-k8s.io/node"
operator: "Exists"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "kwok.x-k8s.io/node"
operator: "In"
values:
- "fake"
containers:
- name: app
image: fake.registry/fake:latest
resources:
requests:
cpu: "100m"
memory: "128Mi"
EOF
500 pods will be scheduled and transitioned to Running – virtually instantly, with zero actual container starts.
Step 5: Clean up
kwokctl delete cluster --name=kwok-demo
That’s it. No lingering containers, no orphaned volumes.
Advanced Use Cases
Testing Autoscaling: HPA, VPA, and Karpenter
One of KWOK’s most powerful applications is testing autoscaling logic. To make HPA and VPA work, pods need to report CPU and memory metrics. KWOK supports this via annotations or the ClusterResourceUsage CRD.
Simulating CPU usage via annotations:
apiVersion: v1
kind: Pod
metadata:
name: high-cpu-pod
annotations:
kwok.x-k8s.io/usage-cpu: "850m"
kwok.x-k8s.io/usage-memory: "256Mi"
spec:
# ...
Simulating dynamic usage via ClusterResourceUsage:
apiVersion: kwok.x-k8s.io/v1alpha1
kind: ClusterResourceUsage
metadata:
name: simulate-growing-load
spec:
target:
apiGroup: v1
kind: Pod
usages:
- resourceType: cpu
expression: |
Quantity("100m") * (pod.SinceSecond() / 60.0)
- resourceType: memory
expression: |
Quantity("64Mi") + Quantity("1Mi") * (pod.SinceSecond() / 30.0)
This expression simulates a pod whose CPU usage grows linearly over time – perfect for testing HPA scale-out triggers without any real workload.
Karpenter integration:
KWOK has official support in the Karpenter project via a kwok provider. You can run the full Karpenter control loop, pointing it at a KWOK cluster, and test node provisioning decisions at scale without touching EC2:
kwokctl create cluster --name=karpenter-test \
--config=./karpenter-kwok-config.yaml
This is exactly how the Karpenter team validates autoscaling logic in CI.
Controller Development and Testing
If you’re writing a Kubernetes operator or controller, KWOK gives you a realistic environment to test reconciliation logic at scale. You can:
- Create 10,000 pods in a specific state and watch your controller process them
- Simulate node failures and verify your controller responds correctly
- Test finalizer logic and deletion workflows at scale
- Validate that your controller doesn’t hammer the API server under load
Simulating a node failure:
Using a custom Stage, you can make a node’s Ready condition flip to False after a delay, simulating a network partition or hardware failure:
apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
name: node-heartbeat-with-lease
spec:
resourceRef:
apiGroup: v1
kind: Node
selector:
matchExpressions:
- key: '.metadata.annotations["kwok.x-k8s.io/node"]'
operator: In
values:
- fake
delay:
durationMilliseconds: 30000
jitterDurationMilliseconds: 5000
next:
statusTemplate: |
conditions:
- lastTransitionTime: {{ Now }}
message: "Simulated node failure"
reason: KwokSimulation
status: "False"
type: Ready
CI/CD Integration
KWOK is excellent for CI pipelines. A complete cluster creation and test run takes seconds rather than minutes, and requires only a Docker daemon – no cloud credentials, no cloud costs.
Example GitHub Actions workflow:
name: Controller Integration Tests
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install kwokctl
run: |
brew install kwok
- name: Create KWOK cluster
run: |
kwokctl create cluster --name=ci-test
kubectl wait --for=condition=Ready node --all --timeout=60s
- name: Deploy simulated nodes
run: |
kubectl apply -f ./test/fixtures/nodes.yaml
- name: Run integration tests
run: |
go test ./... -v -tags=integration
- name: Cleanup
if: always()
run: kwokctl delete cluster --name=ci-test
The entire cluster lifecycle fits in a single CI job, with no persistent infrastructure.
KWOK vs. The Alternatives
This is the question I get most often: “when do I use KWOK instead of kind/k3d/minikube?”
The answer is simple: KWOK and these tools solve different problems. They are not fully interchangeable.
| Feature | KWOK | kind | k3d | minikube | vcluster |
|---|---|---|---|---|---|
| Real containers | No | Yes | Yes | Yes | Yes |
| Startup time | ~1 sec | ~30-60 sec | ~10-20 sec | ~60-90 sec | ~30 sec |
| Resource usage | Minimal | Medium | Low | Medium-High | Low |
| Max simulated nodes | 1000+ | ~10-20 | ~10-20 | ~5 | N/A |
| Pod networking | Simulated | Real | Real | Real | Real |
| Metrics (HPA/VPA) | Simulated | Real | Real | Real | Real |
| CRD / controller testing | Excellent | Good | Good | Good | Good |
| Autoscaling testing | Excellent | Limited | Limited | Limited | Limited |
| Actual workload testing | No | Yes | Yes | Yes | Yes |
| Cloud provider simulation | Yes | Limited | Limited | Limited | No |
graph LR
A[What do you need?]
A --> B{Real containers?}
B -- Yes --> C{Resource-constrained?}
B -- No --> D[KWOK]
C -- Yes --> E[k3d / kind]
C -- No --> F[minikube]
D --> G[Scale testing, controller dev,\nautoscaling simulation, CI/CD]
E --> H[CI/CD with real workloads,\nlocal development]
F --> I[Learning, local development,\nfull dashboard UX]
Use KWOK when:
- You need to simulate 10, 100, or 1000+ nodes
- You’re testing controller or operator logic
- You’re validating autoscaling triggers (HPA, VPA, Karpenter, Cluster Autoscaler)
- You need fast cluster lifecycle in CI/CD
- You want to test scheduling decisions and affinity rules at scale
Use kind or k3d when:
- You need containers to actually start and run
- You’re testing pod networking, CNI plugins, or network policies
- You’re testing ingress controllers with real traffic
- You’re testing volume mounting and persistence
- You need a local development environment for a real application
The tools complement each other well. I personally use k3d for my homelab (as covered in Part 1 of this series) and KWOK for controller testing and CI pipelines.
Limitations: What KWOK Cannot Do
This is crucial to understand before adopting KWOK. The simulation has real limits.
Pods don’t actually run. This is the most important limitation. If you deploy nginx to a KWOK cluster and try to curl its cluster IP, nothing happens. There is no running container, no listening socket, no real process. The API server shows the pod as Running, but that’s it.
This means you cannot test:
- Pod networking – no real traffic flows between pods
- Volume mounting – PVCs bind (if you configure a storage provisioner), but nothing actually mounts
- Container behavior – init containers, sidecar logic, actual application code
- Device plugins – GPU allocation, specialized hardware
- CNI plugin behavior – network policies are purely theoretical
- Service load balancing – no real iptables rules are written, no real traffic is forwarded
- Kubelet-specific features – eviction based on actual node pressure, OOM handling, etc.
The divergence risk. Because KWOK simulates kubelet behavior, there may be subtle edge cases where its simulation diverges from how a real kubelet would behave. For critical control-plane logic, always validate on a real cluster before shipping.
No persistent cluster by default. When you kwokctl delete cluster, everything is gone. This is usually what you want, but keep it in mind.
Real-World Adoption
KWOK is not a toy. Here’s who’s using it in production:
- Kubernetes SIG Scalability – Uses KWOK for the official Kubernetes scalability benchmark tests
- Karpenter – Official
kwokprovider for testing node autoscaling logic - Apache YuniKorn – Performance evaluation and scalability testing
- OpenTelemetry Collector – Performance testing of Kubernetes-related components at scale
- Clusterpedia – E2E testing by importing KWOK simulation clusters
- NVIDIA (Knavigator) – Virtual node testing for GPU cluster simulation
- Headlamp / Aptakube – Load testing their Kubernetes dashboard UIs against large clusters
- AWS – Data plane cost modeling with Karpenter and KWOK
- IBM – Tutorials on using KWOK for OpenShift simulation
The Karpenter and SIG Scalability integrations are particularly meaningful – these are teams that need to simulate cluster behavior at extreme scale, and they chose KWOK as the foundation.
Summary
KWOK is one of those tools that once you discover it, you wonder how you ever lived without it. The table below summarizes where it fits:
| Scenario | KWOK verdict |
|---|---|
| Testing a Kubernetes controller at 1000 nodes | Perfect fit |
| Validating HPA/VPA autoscaling triggers | Perfect fit |
| Testing Karpenter node provisioning decisions | Perfect fit (official integration) |
| CI/CD: fast cluster spin-up for integration tests | Excellent |
| Simulating node failures for chaos testing | Good (with custom Stages) |
| Testing an actual containerized application | Not suitable |
| Testing pod networking / CNI plugins | Not suitable |
| Local development of a microservices app | Not suitable |
If you work with Kubernetes at scale – whether building operators, platform tooling, or testing autoscaling logic – KWOK deserves a place in your toolkit. It’s fast, lightweight, free, and backed by the Kubernetes SIG organization.
The entire source is at github.com/kubernetes-sigs/kwok, and the documentation at kwok.sigs.k8s.io is excellent. Start with kwokctl create cluster and go from there.
This post is part of my ongoing Homelab Series. For the cluster foundation this tooling builds on, see Part 1: Building a Production-Grade Kubernetes Homelab on macOS.