
Building a Production-Grade Kubernetes Homelab on macOS: The Complete Guide
January 15, 2026
TL;DR: We’re building a 4-node Kubernetes cluster (k3s via k3d) running inside Podman on macOS. The setup includes NFS-based shared storage for RWX volumes, Tailscale for secure remote access, and a foundation ready for GitOps, observability, and hybrid cloud connectivity. All configuration is declarative and version-controlled.
Welcome to the first post in my Homelab Series. Our goal is ambitious: build a personal cloud platform from scratch that mirrors production-grade infrastructure patterns.
This isn’t a toy project. We’re engineering a versatile environment ready for:
- GitOps workflows with ArgoCD
- Full observability stack (Prometheus, Grafana, Loki)
- Stateful workloads with proper persistent storage
- Service mesh for advanced networking
- Home automation integration (Home Assistant, MQTT)
- Hybrid cloud connectivity with AWS/GCP
The guiding principle is environment parity: the architecture we build will run identically here on a local machine and in a cloud environment. Multi-cloud support is the long-term goal.
It all begins with a solid foundation—a local Kubernetes cluster on macOS that doesn’t cut corners.
In this post, we’ll cover:
- The complete installation process with copy-paste-ready commands
- Why Podman beats Docker Desktop for this use case
- The storage challenge: why Longhorn fails and NFS wins
- Tailscale networking for secure remote cluster access
- Architecture decisions and their trade-offs
Architecture Overview
Before diving into commands, let’s understand the layered architecture we’re building:
graph TB
subgraph macOS["macOS Host"]
subgraph podman["Podman Machine (Fedora VM)"]
subgraph k3d["k3d Cluster 'homelab'"]
server["Server-0
(control-plane, etcd)"]
agent0["Agent-0"]
agent1["Agent-1"]
agent2["Agent-2"]
end
end
ports["Tailscale IP:6443 (kubeAPI)
Ports 80/443 (Ingress)"]
end
nas[("Synology NAS
RWX Storage")]
k3d --> ports
ports -.->|"NFS: 192.168.55.x:/volume1/k8s-volumes"| nas
This is a Configuration-as-Code project—the entire cluster definition lives in version-controlled files. This approach gives us:
- Reproducibility: Destroy and recreate the cluster in minutes
- Auditability: Every change is tracked in Git history
- Portability: Share the setup across machines or team members
- Documentation: The config files are the documentation
Why k3s?
k3s is a CNCF-certified Kubernetes distribution optimized for resource-constrained environments. Compared to full Kubernetes (kubeadm, kubespray), k3s offers:
| Feature | k3s | Full Kubernetes |
|---|---|---|
| Binary size | ~70MB | ~1GB+ |
| Memory footprint | ~512MB | ~2GB+ |
| Default datastore | SQLite/etcd | etcd |
| Built-in components | Traefik, CoreDNS, Metrics Server | Manual installation |
| Certificate management | Automatic | Manual/cert-manager |
For a homelab, k3s is the sweet spot: full Kubernetes API compatibility with a fraction of the overhead.
Why k3d?
k3d wraps k3s in Docker/Podman containers, enabling:
- Multi-node clusters on a single machine
- Fast iteration: create/destroy clusters in seconds
- Port mapping: expose services to the host
- Registry integration: local container registries
The alternative would be running k3s directly on the Podman VM, but k3d gives us the flexibility to simulate multi-node topologies.
Workshop: Building the Cluster Step-by-Step
Time for the hands-on part. Every command below has been tested and is ready to copy-paste.
Step 1: Prerequisites
Before we start, ensure you have the necessary tools. Homebrew is the easiest way to install them on macOS.
# Install all required packages
brew install podman k3d helm kubectl
# Verify installations
podman --version # Tested with: podman version 5.x
k3d version # Tested with: k3d version v5.x
helm version # Tested with: v3.x
kubectl version --client
Required tools explained:
| Tool | Purpose |
|---|---|
podman | Container runtime (Docker alternative) |
k3d | k3s-in-Docker/Podman wrapper |
helm | Kubernetes package manager |
kubectl | Kubernetes CLI |
Optional but recommended:
- Tailscale: Secure remote access to the cluster from anywhere. Free for personal use (up to 100 devices).
- NFS Server: For shared storage (RWX volumes). A Synology/QNAP NAS works great, or any Linux box with
nfs-kernel-server. - k9s: Terminal-based Kubernetes dashboard (
brew install k9s).
Step 2: Configuring the Podman Machine
On macOS, containers can’t run natively—they need a Linux VM. Podman manages this transparently through “Podman Machine,” a lightweight Fedora-based VM running under Apple’s Virtualization framework (or QEMU on Intel Macs).
# 1. Initialize the virtual machine with appropriate resources
podman machine init \
--cpus 6 \
--memory 8192 \
--disk-size 50 \
--volume /private/nfs/k8s-volumes:/private/nfs/k8s-volumes
Resource allocation guidelines:
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| CPUs | 4 | 6+ | More helps with parallel workloads |
| Memory | 4GB | 8GB+ | k3s needs ~512MB, rest for workloads |
| Disk | 30GB | 50GB+ | Container images add up quickly |
The --volume flag creates a mount point for NFS passthrough (optional, for advanced NFS setups).
# 2. Enable rootful mode (required for privileged port binding)
podman machine set --rootful
Why rootful? By default, Podman runs in rootless mode for security. However, binding ports below 1024 (like 80/443 for HTTP/HTTPS) requires root privileges. Since we want our Ingress controller on standard ports, rootful mode is necessary.
# 3. Start the machine
podman machine start
# Verify it's running
podman machine list
Docker Desktop conflict resolution:
If you have Docker Desktop installed alongside Podman, ensure your shell uses the correct socket:
# Check current context
docker context list
# Switch to Podman (the "default" context uses Podman's socket)
docker context use default
# Verify you're talking to Podman
docker info | grep -i "operating system"
# Should show: Fedora Linux (not Docker Desktop)
Pro tip: Add
export DOCKER_HOST="unix://$HOME/.local/share/containers/podman/machine/podman.sock"to your shell profile to ensure Podman is always used.
Step 3: Configuring the k3d Cluster
The k3d/config.yaml file is the declarative definition of our cluster. Let’s examine the key configuration options:
# k3d/config.yaml - Key sections explained
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
name: homelab
servers: 1 # Control plane nodes (1 is enough for homelab)
agents: 3 # Worker nodes (scale based on workload needs)
kubeAPI:
host: "100.115.231.42" # Your Tailscale IP (run: tailscale ip -4)
hostPort: "6443" # Standard Kubernetes API port
ports:
- port: 80:80 # HTTP ingress
nodeFilters: [loadbalancer]
- port: 443:443 # HTTPS ingress
nodeFilters: [loadbalancer]
options:
k3s:
extraArgs:
- arg: --disable=traefik # We'll install our own ingress
nodeFilters: [server:*]
- arg: --disable=servicelb # Using NodePort/Ingress instead
nodeFilters: [server:*]
Configuration breakdown:
| Setting | Value | Rationale |
|---|---|---|
servers: 1 | Single control plane | HA requires 3+ servers; overkill for homelab |
agents: 3 | Three workers | Allows testing pod anti-affinity and rolling updates |
kubeAPI.host | Tailscale IP | Enables remote kubectl access from any device |
--disable=traefik | No default ingress | We’ll install nginx-ingress or Traefik ourselves for more control |
--disable=servicelb | No ServiceLB | Using standard Ingress instead of k3s’s Klipper LB |
Before creating the cluster, update the kubeAPI.host:
# Get your Tailscale IP
tailscale ip -4
# Or use your Mac's local IP (for LAN-only access)
ipconfig getifaddr en0
Edit k3d/config.yaml and paste your IP in the kubeAPI.host field.
Step 4: Creating the Cluster
With configuration in place, cluster creation is a single command:
k3d cluster create --config k3d/config.yaml
Behind the scenes, k3d:
- Pulls the
rancher/k3simage - Creates a Docker network for inter-node communication
- Starts the server container (control plane)
- Starts agent containers (workers)
- Sets up the load balancer for port forwarding
- Generates TLS certificates and kubeconfig
Expected output:
INFO[0000] Using config file k3d/config.yaml
INFO[0000] Prep: Network
INFO[0001] Created network 'k3d-homelab'
INFO[0001] Created image volume k3d-homelab-images
INFO[0001] Starting new tools node...
INFO[0002] Creating node 'k3d-homelab-server-0'
INFO[0003] Creating node 'k3d-homelab-agent-0'
INFO[0003] Creating node 'k3d-homelab-agent-1'
INFO[0003] Creating node 'k3d-homelab-agent-2'
INFO[0004] Creating LoadBalancer 'k3d-homelab-serverlb'
...
INFO[0025] Cluster 'homelab' created successfully!
The entire process takes 20-40 seconds depending on your machine.
Step 5: Accessing the Cluster (Kubeconfig)
Kubernetes tools need a kubeconfig file to authenticate with the cluster. k3d can merge the new cluster’s credentials with your existing config:
# Merge and switch context in one command
k3d kubeconfig merge homelab --kubeconfig-switch-context
Alternative: Keep k3d config separate
If you manage multiple clusters, you might prefer isolated kubeconfig files:
# Export to a dedicated file
k3d kubeconfig get homelab > ~/.config/k3d/kubeconfig-homelab.yaml
# Use it for this session
export KUBECONFIG=~/.config/k3d/kubeconfig-homelab.yaml
# Or add to your shell profile for persistence
echo 'export KUBECONFIG=~/.config/k3d/kubeconfig-homelab.yaml' >> ~/.zshrc
Verify the cluster:
kubectl get nodes -o wide
Expected output (4 nodes in Ready state):
NAME STATUS ROLES AGE VERSION INTERNAL-IP
k3d-homelab-server-0 Ready control-plane,master 30m v1.33.4+k3s1 172.18.0.3
k3d-homelab-agent-0 Ready <none> 29m v1.33.4+k3s1 172.18.0.4
k3d-homelab-agent-1 Ready <none> 29m v1.33.4+k3s1 172.18.0.5
k3d-homelab-agent-2 Ready <none> 29m v1.33.4+k3s1 172.18.0.6
Quick health check:
# Check system pods are running
kubectl get pods -n kube-system
# Verify storage classes
kubectl get storageclass
You should see local-path as the default StorageClass (installed by k3s automatically).
Step 6: Installing the NFS CSI Driver (for RWX Storage)
Our cluster has local-path storage by default, which is great for single-pod workloads (RWO = ReadWriteOnce). But many real-world applications need shared storage—databases with replicas, content management systems, shared caches.
This is where NFS and ReadWriteMany (RWX) volumes come in.
Understanding Kubernetes storage access modes:
| Access Mode | Abbreviation | Use Case |
|---|---|---|
| ReadWriteOnce | RWO | Single pod can read/write (databases, single-replica apps) |
| ReadOnlyMany | ROX | Many pods can read (static assets, configs) |
| ReadWriteMany | RWX | Many pods can read/write (shared uploads, CMS, collaboration tools) |
Install the NFS CSI driver:
# Add the Helm repository
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update
# Install the driver
helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
--namespace kube-system \
--set externalSnapshotter.enabled=false \
--set controller.replicas=1
# Verify installation
kubectl get pods -n kube-system -l app.kubernetes.io/name=csi-driver-nfs
The CSI driver provides the interface between Kubernetes and NFS—it handles mounting, provisioning, and lifecycle management.
Step 7: Creating the StorageClass for NFS
The driver is installed, but Kubernetes needs a StorageClass to know where to provision NFS volumes.
Configure the NFS server details:
Edit extras/nfs/storageclass-nfs.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-rwx
provisioner: nfs.csi.k8s.io
parameters:
server: 192.168.55.115 # IP of your NFS server (NAS, Linux box)
share: /volume1/k8s-volumes # NFS export path
reclaimPolicy: Delete # Auto-delete PV when PVC is deleted
volumeBindingMode: Immediate # Provision immediately when PVC is created
mountOptions:
- nfsvers=4.1 # NFS version (4.1 recommended for performance)
- hard # Hard mount (retry indefinitely on failure)
- noatime # Don't update access times (better performance)
Apply the StorageClass:
kubectl apply -f extras/nfs/storageclass-nfs.yaml
# Verify it exists
kubectl get storageclass
Test NFS provisioning:
# Create a test PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-nfs-pvc
spec:
accessModes: [ReadWriteMany]
storageClassName: nfs-rwx
resources:
requests:
storage: 1Gi
EOF
# Check it's bound
kubectl get pvc test-nfs-pvc
# Clean up
kubectl delete pvc test-nfs-pvc
If the PVC shows Bound status, your NFS storage is working correctly.
Deep Dive: Tailscale for Zero-Config Remote Access
In Step 3, we configured the Kubernetes API to listen on a Tailscale IP. This section explains why Tailscale is a game-changer for homelab infrastructure.
What is Tailscale?
Tailscale is a mesh VPN built on WireGuard—a modern, high-performance VPN protocol that’s now part of the Linux kernel. But unlike traditional VPNs that require server setup, certificate management, and firewall rules, Tailscale handles all the complexity for you.
How it works:
graph LR
laptop["Laptop (anywhere)
100.x.y.z"]
coord["Tailscale
Coordination Servers"]
homelab["Homelab Mac
100.a.b.c"]
laptop <-->|"Direct P2P
(encrypted traffic)"| homelab
laptop -.->|"Key exchange only"| coord
homelab -.->|"Key exchange only"| coord
The coordination servers handle identity, key exchange, and NAT traversal—but your actual traffic flows directly between devices (peer-to-peer) whenever possible. Even through most NATs and firewalls.
Why Tailscale for Kubernetes?
| Challenge | Traditional Solution | Tailscale Solution |
|---|---|---|
| Remote kubectl access | Port forwarding, dynamic DNS, certificates | Just works™ via stable IPs |
| Changing home IP | DDNS updates, kubeconfig changes | Tailscale IP never changes |
| Security | Exposing API to internet, firewall rules | Zero exposed ports, E2E encryption |
| Multi-device access | VPN server setup, client configs | Install app, sign in, done |
Specific benefits for our setup:
Stable API endpoint: The
100.x.y.zIP inkubeAPI.hostnever changes, regardless of your local network configuration.No port forwarding: Your home router doesn’t need any configuration. Tailscale punches through NAT automatically.
Security by default: The Kubernetes API is never exposed to the public internet—only devices on your tailnet can reach it.
MagicDNS: Access your homelab by name (
homelab-mac.tail-net.ts.net) instead of memorizing IPs.
Future: Hybrid Cloud with Tailscale
The real power of Tailscale emerges when you connect cloud resources:
graph LR
subgraph tailnet["Your Tailnet (100.x.y.z/8)"]
laptop["Laptop"]
homelab["Homelab k8s"]
aws["AWS EC2
Worker"]
gcp["GCP VM
Monitoring"]
end
laptop <--> homelab
homelab <--> aws
aws <--> gcp
laptop <--> gcp
Scenarios this enables:
- Hybrid CI/CD: GitHub Actions runner in the cloud deploys directly to your homelab cluster
- Managed services integration: Homelab apps connect to AWS RDS, CloudSQL, or ElastiCache
- Distributed monitoring: Centralized Grafana in the cloud scrapes metrics from homelab Prometheus
- Disaster recovery: Replicate data from homelab to cloud storage
All without complex site-to-site VPN tunnels, static IPs, or exposing services to the internet.
Deep Dive: Podman vs Docker Desktop
“Docker” and “containers” have become synonymous, but Docker Desktop isn’t the only option—and for homelabs, it might not be the best one.
The Docker Licensing Issue
In 2021, Docker Inc. changed Docker Desktop’s licensing: free for personal use and small businesses (< 250 employees, < $10M revenue), paid for larger organizations. While this likely doesn’t affect homelab users, it created an industry-wide push toward alternatives.
Enter Podman
Podman (Pod Manager) is Red Hat’s OCI-compliant container engine. It’s the default container runtime in RHEL, Fedora, and CentOS Stream.
Architectural differences:
| Aspect | Docker Desktop | Podman |
|---|---|---|
| Architecture | Client-server (dockerd daemon) | Daemonless (fork/exec model) |
| Process model | Daemon manages all containers | Each container is a direct process |
| Default security | Root daemon | Rootless by default |
| macOS implementation | Heavy GUI app + VM | CLI + minimal VM |
| Licensing | Proprietary (free tier) | Apache 2.0 (fully open source) |
| Resource usage | ~2GB+ RAM for Desktop app | ~500MB for VM only |
Why daemonless matters:
graph TB
subgraph docker["Docker Architecture"]
dcli["docker CLI"] -->|"API call"| daemon["dockerd (daemon)
SPOF"]
daemon --> cont1["Container Process"]
end
subgraph podman["Podman Architecture"]
pcli["podman CLI"] -->|"fork/exec"| cont2["Container Process
Direct process"]
end
If dockerd crashes, all containers become unmanageable. With Podman, containers are independent processes—if Podman CLI crashes, containers keep running.
Why Podman for This Project?
- Lower resource overhead: No heavy GUI app eating RAM in the background
- Full Docker compatibility: Same CLI commands, same image format, compatible socket API
- Rootless security: Better isolation (though we use rootful for port 80/443)
- Open source: No licensing concerns, community-driven development
- Red Hat backing: Enterprise-grade stability and long-term support
Podman Commands Cheat Sheet
# Most Docker commands work identically
podman pull nginx:alpine
podman run -d -p 8080:80 nginx:alpine
podman ps
podman logs <container-id>
podman exec -it <container-id> sh
podman stop <container-id>
podman rm <container-id>
# Podman Machine (macOS only)
podman machine list
podman machine start
podman machine stop
podman machine ssh # SSH into the VM
podman machine inspect # Show VM details
Alias tip: Add
alias docker=podmanto your shell profile for muscle memory compatibility.
Deep Dive: Why Longhorn Won’t Work (And What Will)
When planning Kubernetes storage, Longhorn is often the first choice—it’s a CNCF-incubating project that provides distributed block storage with replication, snapshots, and disaster recovery.
So why aren’t we using it?
The Nested Virtualization Problem
Our architecture creates a “matryoshka doll” situation:
graph TB
subgraph macos["macOS Host"]
subgraph vm["Podman VM (Fedora)"]
subgraph container["k3d Container (Node)"]
longhorn["Longhorn needs:
/dev/longhorn (block device)
iSCSI kernel modules
Direct disk access"]
end
end
end
Longhorn requires:
- Block device access (
/dev/longhorn/*) — containers don’t have real block devices - iSCSI kernel modules — the container shares the host’s kernel (Podman VM), not its own
- Open-iSCSI initiator — requires
iscsiddaemon with proper privileges
In cloud environments (AWS, GCP), Kubernetes nodes are full VMs with their own kernels and attached block devices (EBS, Persistent Disks). Longhorn works great there.
In our setup, “nodes” are containers sharing a single VM’s kernel. There’s no way to provide isolated block devices to each container.
Storage Options Comparison
| Storage Solution | Works in k3d? | Access Modes | Use Case |
|---|---|---|---|
local-path (k3s default) | ✅ Yes | RWO | Single-pod workloads |
| NFS CSI driver | ✅ Yes | RWO, ROX, RWX | Shared storage |
| Longhorn | ❌ No | RWO, RWX | Cloud/bare metal only |
| OpenEBS (Jiva) | ⚠️ Complex | RWO | Requires privileged containers |
| Rook-Ceph | ❌ No | RWO, RWX | Full VMs only |
Why NFS is the Right Choice
For our local homelab, NFS provides exactly what we need:
| NFS Advantage | Explanation |
|---|---|
| RWX support | Multiple pods can read/write simultaneously |
| External storage | Data persists even if cluster is destroyed |
| Simple setup | Any NAS or Linux box can serve NFS |
| Performance | NFSv4.1+ is fast enough for most workloads |
| No kernel dependencies | Just needs network connectivity |
When you eventually deploy to cloud, you can replace NFS with Longhorn or cloud-native storage (EBS CSI, GCE PD CSI) while keeping the same PersistentVolumeClaim abstractions. That’s the beauty of Kubernetes storage classes.
Troubleshooting
Cluster won’t start
Symptom: k3d cluster create hangs or fails
# Check Podman machine is running
podman machine list
# If stopped, start it
podman machine start
# Check for port conflicts (80, 443, 6443)
lsof -i :80
lsof -i :443
lsof -i :6443
kubectl can’t connect
Symptom: Unable to connect to the server: dial tcp: lookup ... no such host
# Verify kubeconfig is set
echo $KUBECONFIG
# Re-merge kubeconfig
k3d kubeconfig merge homelab --kubeconfig-switch-context
# Test connectivity to API server
curl -k https://<your-tailscale-ip>:6443/healthz
NFS PVC stuck in Pending
Symptom: PVC shows Pending status indefinitely
# Check CSI driver pods
kubectl get pods -n kube-system -l app.kubernetes.io/name=csi-driver-nfs
# Check NFS server connectivity from a pod
kubectl run nfs-test --rm -it --image=busybox -- \
ping -c 3 192.168.55.115
# Verify NFS export is accessible
showmount -e 192.168.55.115
Podman vs Docker context issues
Symptom: Commands fail with “Cannot connect to Docker daemon”
# Check active context
docker context list
# Force Podman
docker context use default
# Or set environment variable
export DOCKER_HOST="unix://$HOME/.local/share/containers/podman/machine/podman.sock"
Nodes not Ready
Symptom: kubectl get nodes shows NotReady status
# Check node conditions
kubectl describe node k3d-homelab-server-0
# Check container status
podman ps -a | grep k3d-homelab
# Restart stuck containers
k3d cluster stop homelab && k3d cluster start homelab
Quick Reference: Common Commands
# Cluster lifecycle
k3d cluster create --config k3d/config.yaml # Create
k3d cluster start homelab # Start (after stop)
k3d cluster stop homelab # Stop (preserves data)
k3d cluster delete homelab # Destroy completely
# Podman machine
podman machine start # Start VM
podman machine stop # Stop VM
podman machine ssh # SSH into VM
# Kubeconfig
k3d kubeconfig merge homelab --kubeconfig-switch-context
export KUBECONFIG=~/.config/k3d/kubeconfig-homelab.yaml
# Verification
kubectl get nodes -o wide
kubectl get pods -A
kubectl get storageclass
Summary
We’ve built a production-grade local Kubernetes environment:
| Component | Choice | Rationale |
|---|---|---|
| Container runtime | Podman | Lighter, open source, Docker-compatible |
| Kubernetes distribution | k3s (via k3d) | Lightweight, CNCF-certified, fast |
| Cluster topology | 1 server + 3 agents | Realistic multi-node simulation |
| Storage (RWO) | local-path | Built into k3s, zero config |
| Storage (RWX) | NFS CSI | Works in containers, external persistence |
| Remote access | Tailscale | Zero-config VPN, stable IPs |
What we’ve learned:
- Why Podman’s daemonless architecture matters
- How k3d simulates multi-node clusters in containers
- Why Longhorn doesn’t work in nested container environments
- How Tailscale simplifies secure remote access
Coming up in Part 2:
- Installing an Ingress Controller (nginx-ingress or Traefik)
- Deploying first applications
- Setting up TLS certificates with cert-manager
- Introduction to GitOps with ArgoCD
4All code and configuration from this post is available in the cd-homelab repository.