Kubewekend - The K8s Playground

Kubewekend CLI

Unified Bash CLI to setup and operate Kind / K3s Kubernetes clusters for workshops, demos, and local experiments.

Kubewekend CLI

Prerequisites

Tool	Required	Purpose
vagrant	Yes*	VM provisioning
VirtualBox	Yes*	VM provider
ansible	Yes	Cluster orchestration
kubectl	Yes	Kubernetes CLI
helm	Yes	Helm chart management
docker	Optional	Required for Kind clusters
kind	Optional	Kind cluster binary (installed by playbook)
jq	Optional	JSON processing

* Vagrant + VirtualBox are only required for local VM workflows. For remote VPS, only ansible + SSH are needed.

Verify with:

./scripts/setup.sh env check

Getting Started

# 1. Make the script executable
chmod +x ./scripts/setup.sh

# 2. Initialize .env from template
./scripts/setup.sh env init
# Edit .env with your SSH_USER and SSH_KEY

# 3. Check tools
./scripts/setup.sh env check

# 4. Follow a quickstart guide
./scripts/setup.sh quickstart kind-vagrant
./scripts/setup.sh quickstart k3s-vagrant
./scripts/setup.sh quickstart k3s-remote

CLI Reference

./scripts/setup.sh <command> [subcommand] [options]
./scripts/setup.sh help                        # global help
./scripts/setup.sh <command> help              # per-command help

env — Environment Management

Subcommand	Description
`env check`	Verify all required/optional tools are installed
`env init`	Create `.env` from `template.env`
`env show`	Print current environment variables

./scripts/setup.sh env check
./scripts/setup.sh env init
./scripts/setup.sh env show

vagrant — VM Lifecycle

Subcommand	Description
`vagrant up [machines...]`	Provision VMs (default: `k8s-master-machine`)
`vagrant halt [machines...]`	Stop VMs
`vagrant destroy [machines...]`	Destroy VMs (with confirmation)
`vagrant status`	Show VM status
`vagrant ssh <machine>`	SSH into a VM
`vagrant reload [machines...]`	Reload VMs

# Provision master
./scripts/setup.sh vagrant up k8s-master-machine

# Provision master + 2 workers
./scripts/setup.sh vagrant up k8s-master-machine k8s-worker-machine-1 k8s-worker-machine-2

# Provision workers by regex
./scripts/setup.sh vagrant up "/k8s-worker-machine-[1-2]/"

# SSH into master
./scripts/setup.sh vagrant ssh k8s-master-machine

# Halt all
./scripts/setup.sh vagrant halt

inventory — Ansible Inventory

Subcommand	Description
`inventory generate`	Auto-generate inventory from running Vagrant VMs
`inventory show`	Display current inventory file
`inventory ping [group]`	Test SSH connectivity (defaults to `all`)
`inventory set-remote`	Interactive wizard to configure remote VPS inventory

# Auto-generate from vagrant
./scripts/setup.sh inventory generate

# Test connectivity to all hosts
./scripts/setup.sh inventory ping

# Test only standalone masters
./scripts/setup.sh inventory ping standalone-masters

# Setup remote VPS inventory (interactive)
./scripts/setup.sh inventory set-remote

kind — Kind Cluster Operations

Subcommand	Description
`kind setup`	Install tools + create Kind cluster + configure networking
`kind destroy`	Remove Kind cluster and related components
`kind utils <tags...>`	Install K8s utilities by tag

# Full setup
./scripts/setup.sh kind setup

# Setup targeting a specific host
./scripts/setup.sh kind setup --host k8s-master-machine

# Preview without executing
./scripts/setup.sh kind setup --dry-run

# Destroy
./scripts/setup.sh kind destroy

# Install utilities
./scripts/setup.sh kind utils certmanager dashboard
./scripts/setup.sh kind utils ingress_test apigateway_test

k3s — K3s Cluster Operations

Subcommand	Description
`k3s setup`	Setup standalone K3s cluster (1 master + N workers)
`k3s ha-setup`	Setup HA K3s cluster (3+ masters + N workers)
`k3s destroy`	Uninstall K3s from all inventory nodes
`k3s utils <tags...>`	Install K8s utilities by tag

# Standalone setup
./scripts/setup.sh k3s setup

# HA setup (requires inventory + master.yaml pre-configured)
./scripts/setup.sh k3s ha-setup

# Destroy
./scripts/setup.sh k3s destroy

# Utilities
./scripts/setup.sh k3s utils certmanager gitops dashboard

network — VirtualBox NAT Network

Subcommand	Description
`network hookup [name] [cidr]`	Create NAT network and attach VMs (default: `KubewekendNet 10.0.69.0/24`)
`network return-nat`	Revert all VMs back to default NAT
`network status`	List VirtualBox NAT networks

./scripts/setup.sh network hookup
./scripts/setup.sh network hookup MyNet 10.0.100.0/24
./scripts/setup.sh network return-nat
./scripts/setup.sh network status

config — Cluster Configuration

Subcommand	Description
`config show`	Print `ansible/inventories/host_vars/master.yaml`
`config edit`	Open `master.yaml` in `$EDITOR`
`config worker-show`	Print `worker.yaml`
`config worker-edit`	Open `worker.yaml` in `$EDITOR`

./scripts/setup.sh config show
./scripts/setup.sh config edit

status — Project Dashboard

Shows Vagrant VMs, inventory groups, kubectl contexts, and Docker Kind containers.

./scripts/setup.sh status

quickstart — Guided Workflows

Subcommand	Description
`quickstart kind-local`	Kind on localhost (no Vagrant)
`quickstart kind-vagrant`	Kind on Vagrant VMs
`quickstart k3s-vagrant`	K3s on Vagrant VMs
`quickstart k3s-remote`	K3s on remote VPS

./scripts/setup.sh quickstart kind-vagrant
./scripts/setup.sh quickstart k3s-remote

Global Options (kind / k3s)

These options are available on kind setup/destroy/utils and k3s setup/ha-setup/destroy/utils:

Option	Description
`--host, -H <name>`	Ansible host target (default: `k8s-master-machine`)
`--dry-run`	Print the ansible command without executing
`--skip-tags <tags>`	Comma-separated ansible tags to skip
`--extra-vars <vars>`	Additional ansible extra-vars (`key=value`)

# Dry run
./scripts/setup.sh kind setup --dry-run

# Target a different host
./scripts/setup.sh k3s setup --host my-vps-node

# Skip specific tasks
./scripts/setup.sh kind setup --skip-tags setup_cni

# Pass extra variables
./scripts/setup.sh kind setup --extra-vars "kindCluster_image=kindest/node:v1.30.13"

Workflow Examples

1. Kind on Vagrant (VirtualBox)

# Provision VM
./scripts/setup.sh env init
./scripts/setup.sh vagrant up k8s-master-machine

# Generate inventory and verify
./scripts/setup.sh inventory generate
./scripts/setup.sh inventory ping

# Create Kind cluster
./scripts/setup.sh kind setup

# Add cert-manager + test ingress
./scripts/setup.sh kind utils certmanager ingress_test

# Teardown
./scripts/setup.sh kind destroy
./scripts/setup.sh vagrant destroy k8s-master-machine

2. K3s Standalone on Vagrant

# Provision master + worker
./scripts/setup.sh vagrant up k8s-master-machine k8s-worker-machine-1

# Generate inventory
./scripts/setup.sh inventory generate
./scripts/setup.sh inventory ping

# (Optional) Review / edit cluster config
./scripts/setup.sh config edit

# Setup K3s
./scripts/setup.sh k3s setup

# Get kubeconfig
ssh vagrant@192.168.56.99 'sudo cat /etc/rancher/k3s/k3s.yaml' > ~/.kube/config
# Replace 127.0.0.1 with 192.168.56.99 in the kubeconfig file

# Install utilities
./scripts/setup.sh k3s utils certmanager gitops ingress_test

# Teardown
./scripts/setup.sh k3s destroy
./scripts/setup.sh vagrant destroy

3. K3s on Remote VPS

# Interactive inventory wizard
./scripts/setup.sh inventory set-remote
# Enter: VPS IP, SSH port, SSH user, key path, number of workers

# Verify connectivity
./scripts/setup.sh inventory ping

# Edit cluster config (set tlsSANs, loadBalancer IP pool for your network)
./scripts/setup.sh config edit

# Setup K3s
./scripts/setup.sh k3s setup

# Get kubeconfig
ssh root@<vps-ip> 'sudo cat /etc/rancher/k3s/k3s.yaml' > ~/.kube/config
# Replace 127.0.0.1 with your VPS IP

# Install GitOps + cert-manager
./scripts/setup.sh k3s utils certmanager gitops

# Teardown
./scripts/setup.sh k3s destroy

4. K3s High-Availability (HA)

# 1. Edit inventory with HA groups
#    ansible/inventories/hosts needs:
#    [ha_master_init]    — exactly 1 bootstrap node
#    [ha_master_join]    — additional control-plane nodes
#    [ha_worker]         — agent nodes

# 2. Enable HA in config
./scripts/setup.sh config edit
# Set: k3sCluster.highAvailability.enable: true
# Set: k3sCluster.highAvailability.replicas: 3

# 3. Run HA setup
./scripts/setup.sh k3s ha-setup

# 4. Teardown
./scripts/setup.sh k3s destroy

5. Kind on Localhost (No Vagrant)

# Manually set inventory for localhost
cat > ansible/inventories/hosts <<'EOF'
[standalone-masters]
localhost ansible_host=127.0.0.1 ansible_connection=local

[standalone-all:children]
standalone-masters

[all:vars]
ansible_user=$USER
EOF

# Setup Kind
./scripts/setup.sh kind setup --host localhost

# Verify
kubectl cluster-info --context kind-kubewekend

# Cleanup
./scripts/setup.sh kind destroy

6. LGTM Observability Stack + Testing Application

Deploy the full LGTM observability stack (Prometheus + Grafana + Loki + Tempo + Pyroscope) and then run the bundled demo application to exercise traces, logs, metrics, and profiling together.

# 1. Spin up a cluster (Kind example)
./scripts/setup.sh vagrant up k8s-master-machine
./scripts/setup.sh inventory generate
./scripts/setup.sh kind setup

# 2. Install cert-manager first (required by kube-prometheus-stack CRDs)
./scripts/setup.sh kind utils certmanager

# 3. Deploy the full monitoring stack
#    Installs: kube-prometheus-stack, Alloy APM, Loki, Tempo, Pyroscope
./scripts/setup.sh kind utils monitoring

# 4. Build and deploy the LGTM demo application
docker build -t lgtm-testing-backend:latest examples/lgtm-testing/backend
docker build -t lgtm-testing-frontend:latest examples/lgtm-testing/frontend

kubectl apply -f examples/lgtm-testing/k8s/namespace.yaml
kubectl apply -f examples/lgtm-testing/k8s/postgres.yaml
kubectl apply -f examples/lgtm-testing/k8s/backend.yaml
kubectl apply -f examples/lgtm-testing/k8s/frontend.yaml
kubectl -n lgtm-testing wait --for=condition=ready pod \
  -l app.kubernetes.io/part-of=lgtm-testing --timeout=120s

# 5. Seed test data
kubectl -n lgtm-testing exec -it deploy/lgtm-testing-backend -- \
  curl -s -X POST http://localhost:8000/api/seed/

# 6. Generate traffic for each observability scenario
# Normal traces
kubectl -n lgtm-testing exec -it deploy/lgtm-testing-backend -- \
  curl -s http://localhost:8000/api/todos/?owner_id=1

# Auth failures → error spans in Tempo
kubectl -n lgtm-testing exec -it deploy/lgtm-testing-backend -- \
  curl -s -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"alice","password":"WRONG"}'

# CPU flamegraph → Pyroscope
kubectl -n lgtm-testing exec -it deploy/lgtm-testing-backend -- \
  curl -s "http://localhost:8000/api/bottleneck/cpu-intensive?iterations=500000"

# 7. Open Grafana and explore (correlate Traces ↔ Logs ↔ Profiles)
#    Grafana is exposed via Ingress at grafana.local (configure in master.yaml)
echo "Access Grafana at: http://grafana.local"
echo "Data sources: Prometheus, Loki, Tempo, Pyroscope"

See examples/lgtm-testing/README.md for the full test scenario guide and custom metric reference.

Available Utility Tags

Used with kind utils <tags...> or k3s utils <tags...>:

Tag	Description	Playbook
`ingress_test`	Deploy test nginx with ingress	`k8s-utilities-playbook.yaml`
`apigateway_test`	Deploy API Gateway test with weighted routing	`k8s-utilities-playbook.yaml`
`certmanager`	Install cert-manager (v1.19.2)	`k8s-utilities-playbook.yaml`
`dashboard`	Install K8s dashboard (`kubernetes-dashboard` / `headlamp` / `rancher`)	`k8s-utilities-playbook.yaml`
`storage`	Install Longhorn distributed block storage (v1.11.0) with optional iSCSI and NFS support	`k8s-utilities-playbook.yaml`
`secret_management`	Install Vault (v0.32.0) or OpenBao with auto-unseal, Vault Operator, and key persistence	`k8s-utilities-playbook.yaml`
`k8s_extensions`	Install Reflector, Reloader, External Secrets Operator	`k8s-utilities-playbook.yaml`
`gitops`	Install ArgoCD (v9.1.3) with Image Updater + Extensions, or Flux (v2.7.5) with Weave GitOps UI and Kargo	`k8s-utilities-playbook.yaml`
`security`	Install policy engine (Kyverno v3.7.1 or OPA Gatekeeper v3.22.0) and Dex identity provider (v0.24.0)	`k8s-utilities-playbook.yaml`
`idp`	Install Backstage Internal Developer Portal (v2.6.3)	`k8s-utilities-playbook.yaml`
`monitoring`	Install full LGTM stack: kube-prometheus-stack (v82.16.0), Alloy APM (v1.7.0), Loki (v9.5.1), Tempo (v1.26.7), Pyroscope (v1.19.2)	`k8s-utilities-playbook.yaml`
`service_mesh`	Install Istio service mesh (v1.29.1)	`k8s-utilities-playbook.yaml`

Multiple tags can be combined:

./scripts/setup.sh kind utils certmanager dashboard gitops
./scripts/setup.sh k3s utils ingress_test apigateway_test k8s_extensions
# Full observability stack
./scripts/setup.sh kind utils monitoring
# Security + IDP
./scripts/setup.sh k3s utils security idp

Project Structure

scripts/
├── setup.sh              # Kubewekend CLI (this script)
├── README.md             # This documentation
└── legacy/               # Legacy v1 scripts (Kind only)
    ├── operate-kind-cluster.sh   # Auto-generate inventory from Vagrant
    ├── hook-up-ip.sh             # VirtualBox NAT network setup
    ├── return-to-nat.sh          # Revert VMs to default NAT
    └── README.md                 # Legacy documentation

ansible/
├── k3s-playbook.yaml             # K3s standalone setup
├── k3s-ha-playbook.yaml          # K3s HA setup (embedded etcd / external postgres)
├── k3s-remove-playbook.yaml      # K3s teardown
├── kind-playbook.yaml            # Kind setup (CNI, LB, ingress, gateway)
├── k8s-utilities-playbook.yaml   # Post-cluster utilities (cert-manager, vault, monitoring, gitops, etc.)
├── inventories/
│   ├── hosts                     # Ansible inventory (auto-generated or manual)
│   └── host_vars/
│       ├── master.yaml           # Master node config (Kind + K3s + utilities)
│       └── worker.yaml           # Worker node config (K3s only)
└── templates/
    ├── k3s-config.yaml.j2
    ├── kind-config.yaml.j2
    ├── kube-prometheus-stack-values.yaml.j2
    ├── alloy-values.yaml.j2
    ├── loki-values.yaml.j2
    ├── tempo-values.yaml.j2
    ├── pyroscope-values.yaml.j2
    ├── longhorn-iscsi-installation.yaml.j2
    ├── longhorn-nfs-installation.yaml.j2
    ├── ingress-test-deployment.yaml.j2
    └── apigateway-test-deployment.yaml.j2

examples/
└── lgtm-testing/                 # Full-stack LGTM observability demo app
    ├── backend/                  # FastAPI + OpenTelemetry + Pyroscope
    ├── frontend/                 # Nginx + HTML test dashboard
    ├── k8s/                      # Kubernetes manifests (namespace, postgres, backend, frontend)
    ├── docker-compose.yaml       # Local development (all services)
    └── docker-compose.k8s.yaml   # K8s-oriented compose variant

Configuration Files

File	Purpose
`.env`	SSH credentials (from `template.env`)
`ansible/inventories/hosts`	Ansible inventory (hosts, groups, SSH config)
`ansible/inventories/host_vars/master.yaml`	Primary config — Kind networking, K3s version/CNI/LB/ingress, utilities toggles
`ansible/inventories/host_vars/worker.yaml`	Worker-specific config (K3s version, labels, taints)
`ansible.cfg`	Ansible SSH options (host key checking disabled)
`Vagrantfile`	VM definitions (master + 3 workers, VirtualBox)

Key settings in master.yaml:

# Kind
kindCluster.image: "kindest/node:v1.28.9"
kindCluster.ingress.class: "traefik"     # nginx | traefik | cilium | kong
kindCluster.loadbalancer.type: "cloud-provider-kind"  # metallb | cloud-provider-kind

# K3s
k3sCluster.version: "v1.34.5+k3s1"
k3sCluster.cni.type: "flannel"           # flannel | calico | cilium
k3sCluster.loadBalancer.type: "servicelb" # servicelb | metallb
k3sCluster.ingress.class: "traefik"
k3sCluster.highAvailability.enable: false # true for HA

# Utilities
utilities.certmanager.enable: true
utilities.storage.enable: true           # Longhorn distributed storage
utilities.storage.type: "longhorn"
utilities.gitops.type: "argocd"          # argocd | flux
utilities.dashboard.type: "headlamp"     # kubernetes-dashboard | headlamp | rancher
utilities.secretManagement.type: "vault" # vault | openbao
utilities.security.policyEngine.type: "kyverno"  # kyverno | opa-gatekeeper
utilities.security.identity.type: "dex"
utilities.idp.portal.type: "backstage"

# Monitoring (LGTM stack)
utilities.monitoring.enable: true
utilities.monitoring.type: "kube-prometheus-stack"
utilities.monitoring.apm.enable: true          # Alloy APM collector
utilities.monitoring.logging.enable: true      # Loki log aggregation
utilities.monitoring.tracing.enable: true      # Tempo distributed tracing
utilities.monitoring.profiling.enable: true    # Pyroscope continuous profiling
utilities.monitoring.apm.config.clusterName: "kubeweekend"

# Service Mesh
utilities.serviceMesh.enable: true
utilities.serviceMesh.type: "istio"

Contributing

When adding new features to the CLI:

Add a new command function following the pattern: cmd_<name>() for dispatch + cmd_<name>_help() for documentation
Register in the main() case statement
Use parse_ansible_opts and run_ansible_playbook for any ansible-based operations
Add confirmation via confirm() for destructive operations
Update this README with the new command, subcommands, and examples
Test with --dry-run before running against real infrastructure

# Pattern for adding: "monitoring" command
cmd_monitoring_help() { ... }
cmd_monitoring() {
    local sub="${1:-help}"; shift || true
    case "$sub" in
        setup)   monitoring_setup "$@" ;;
        *)       cmd_monitoring_help ;;
    esac
}
# Add to main(): monitoring) cmd_monitoring "$@" ;;