Homelab

Homelab

2026 Update

image

Full write-up: Homelab 2026: Rebuilding the Stack from Bare Metal Up

Hardware

Two physical sites running as a single Kubernetes cluster.

JD site: Lenovo ThinkSystem SR655, AMD EPYC 7B13 (64 cores / 128 threads), 256GB Samsung DDR4 ECC @ 2933 MT/s, Proxmox VE 9.2.3. Single-socket design with a single NUMA domain eliminates cross-socket memory latency entirely.

Storage (ZFS):

  • NAS-SSD: RAIDZ1 across 5x Samsung 870 EVO 4TB SSDs (18.2TB usable, 11TB used). NFS/iSCSI backing for Kubernetes PVs.
  • VM: RAIDZ1 across 3x mixed SSDs (1.62TB usable). VM disk pool.
  • HDD-20T: Mirrored pair of 20TB Seagate enterprise HDDs (20TB usable, 8TB used). Cold and bulk storage.

LINDS site: Dell PowerEdge T630, 2x Intel Xeon E5-2640 v4 (20 cores / 40 threads, dual-socket NUMA), 128GB RAM, Proxmox VE 9.2.3. Storage via PERC H730 hardware RAID controller.

Both sites run Ubiquiti UniFi switching and APs, with VyOS 1.5 (rolling) handling routing, BGP, RA VPN, DNS/DHCP, and site-to-site IPSec.

Terraform (LINDS-Terraform)

Terraform provisions everything from bare Proxmox hosts to a running Kubernetes cluster.

  • VM provisioning: Uses the bpg/proxmox provider to create VMs across both sites. Packer builds golden images (Ubuntu 24.04, CentOS 9) that Terraform clones per host.
  • Talos cluster bootstrap: Generates Talos machine secrets, applies node configs to each control plane and worker, bootstraps etcd, and writes kubeconfig + talosconfig locally. The cluster is 1 control plane + 3 workers at JD (AMD EPYC), 2 workers at LINDS (Intel Xeon), all running Talos v1.13.2 and Kubernetes v1.36.0.
  • Per-arch kernel tuning: AMD and Intel nodes get separate Talos schematics with architecture-specific flags. Both disable all CPU vulnerability mitigations, set transparent_hugepage=always, pin the governor to performance (amd_pstate=active / intel_pstate=active), enable BBR congestion control, isolate RCU callbacks (nohz_full, rcu_nocbs), and apply tuned TCP buffer / conntrack sysctls.
  • Cilium via Helm: Cilium is deployed into kube-system directly from Terraform after bootstrap. kube-proxy is disabled; Cilium’s eBPF datapath handles all service routing with O(1) kernel hash map lookups.
  • BGP wiring: Terraform applies CiliumBGPClusterConfig and CiliumBGPPeerConfig CRDs post-Cilium. JD nodes peer with VyOS at ASN 64512/64550; LINDS nodes at ASN 64513/64551. Cilium advertises the 172.16.1.0/24 LoadBalancer IP pool and pod CIDRs to VyOS, which redistributes them across both sites. No MetalLB.

Ansible (LINDS-Ansible)

Handles all post-Terraform configuration for non-Talos hosts. 14 playbooks and roles:

  • VyOS: Full router config via vyos.vyos collection. BGP peering, IPSec site-to-site VPN, RA VPN, DNS/DHCP, NAT, firewall. VyOS itself gets kernel-level tuning (disable-mitigations, network-throughput mode, TCP buffer sysctls).
  • TrueNAS: NFS/iSCSI configuration for Kubernetes persistent volume backing.
  • General services: Plex, Minecraft, torrent hosts, dev VMs, WSL setup.
  • Common baseline: NTP, auto-updates, logrotate applied uniformly.

Kubernetes (LINDS-Kubernetes)

All workloads managed via ArgoCD and Helm. The repo is ArgoCD Application manifests; reconciliation is fully automated. Rebuilding from scratch: ./app-deployment.sh bootstraps ArgoCD, then it self-heals to the desired state.

Cluster: 6 nodes (1 control plane + 5 workers), Talos v1.13.2, Kubernetes v1.36.0, all nodes Ready for 157 days.

Infrastructure layer

ComponentDetails
Cilium 1.19CNI, kube-proxy replacement, eBPF datapath, BGP control plane, Hubble flow observability
ArgoCD + image-updaterGitOps reconciliation; image-updater auto-bumps tags on new pushes
cert-managerAutomatic TLS via Let’s Encrypt
Vault + external-secretsSecrets management; external-secrets syncs Vault secrets into Kubernetes
external-dnsSyncs LoadBalancer/Ingress hostnames to internal DNS automatically
nginx-ingressIngress controller, running as a DaemonSet across all nodes
kube-prometheus stackPrometheus, Grafana, AlertManager, node-exporter on all 6 nodes
Loki + Grafana AlloyLog aggregation with pod log collection via Alloy
OpenTelemetry (OBI)eBPF-based auto-instrumentation DaemonSet; traces service calls without code changes
CloudNativePGPostgreSQL operator with barman-cloud continuous WAL archiving
csi-nfs + csi-smbNFS/SMB CSI drivers backed by TrueNAS
GitHub Actions runnersSelf-hosted runner controller (ARC) for homelab CI pipelines
kube-deschedulerPeriodic pod rebalancing across nodes

Applications

AppNotes
ImmichSelf-hosted photo management, 3 ML inference replicas with GPU acceleration
Home AssistantHome automation
PlexMedia server
FactorioGame server
MumbleSelf-hosted voice server
CatcrawlSupermarket price scraper (personal project, runs CI via self-hosted runners)
Stirling PDFSelf-hosted PDF tooling
ZabbixInfrastructure monitoring (server, web, Java gateway, agent on all nodes)

Changelog - 2023-2026

Added to JD Site

  • JD-proxmox-01 (LENOVO-SR655 - Proxmox VE 9.1.4)
  • JS-VyOS-01 (VyOS 1.5 rolling)
  • talos-cp-01 (Talos OS)
  • talos-worker-01 (Talos OS)
  • talos-worker-02 (Talos OS)
  • talos-worker-03 (Talos OS)
  • USW-Enterprise-24-PoE (Ubiquiti UniFi Switch Enterprise 24 PoE)
  • USW-Enterprise-8-PoE (Ubiquiti UniFi Switch Enterprise 8 PoE)
  • 2x Unifi-7-Pro-AP (Ubiquiti UniFi 7 Pro Access Point)
  • 3x Unifi G5 Flex Camera
  • 1x Unifi G6 Turrent Camera
  • Unifi Cloud Key Gen 2 Plus

Added to LINDS Site

  • LINDS-proxmox-01 (Dell T630 - Proxmox VE 9.1.4)
  • LINDS-VyOS-01 (VyOS 1.5 rolling)
  • talos-linds-worker-01 (Talos OS)
  • talos-linds-worker-02 (Talos OS)
  • 2x Unifi-6-AP (Ubiquiti UniFi 6 Access Point)
  • 3x Unifi G5 Flex Camera
  • Unifi Cloud Key Gen 2 Plus

2022 Half Year Update:

image

There is a number of changes here, upgraded server, Dell R710 -> Dell T630, a new physical server, HPE DL360 G9, in a new location.

Changelog - 2022 H2

Added >

  • LINDS-OPNSense-01 (OPNSense 22.1)
  • HPE OfficeConnect 1920s
  • LINDS-ESXi-02 (Dell T630)
  • JD-ESXi-01 (HPE DL360 G9)
  • > JD-DC-01 (Windows Server 2019)
  • > JD-Dev-01 (CentOS 9 Stream)
  • > JD-Zabbix-01 (CentOS 8 Stream)
  • > JD-Plex-01 (CentOS 9 Stream)
  • > JD-Docker-01 (CentOS 9 Stream)
  • > JD-Torrent-01 (CentOS 8 Stream)
  • > JD-VSCA-01 (vSphere Photon OS)
  • > JD-Docker-01 (CentOS 9 Stream)
  • > JD-OPNSense-01 (OPNSense 22.1)
  • > JD-GitLab-01 (CentOS 8 Stream)
  • > JD-GitLab-R01 (CentOS 8 Stream)
  • > KUBE-ADM (CentOS 8 Stream)
  • > KUBE-01 (CentOS 8 Stream)
  • > KUBE-02 (CentOS 8 Stream)

Removed <

  • < LINDS-PiHole
  • < LINDS-ERx (UBIQUITI EDGEROUTER X)
  • < LINDS-Plex (Windows Server 2019)
  • < LINDS-Veeam (Windows Server 2019)
  • < LINDS-Web (Windows Server 2019)
  • < LINDS-MineOS (Turnkey MineOS)
  • < Dell PowerConnect 6248
  • < LINDS-VSCA (vSphere Photon OS)

2020 Update:

Virtual Machines

homelab

LINDS-DC - Domain Controller, DNS, File Shares, Certificate Authority - Server 2016
LINDS-DC2 - Domain Controller, DNS, Windows Deployment Services - Server 2019
LINDS-PLEX - Plex Server - Server 2019
LINDS-PiHole - DNS, Adblocking - CentOS 7
LINDS-Backup - Backblaze client to backup the 12TB stored on LINDS-DC - Windows 10
LINDS-MineOS - 4 Minecraft servers- Turnkey Linux
LINDS-WEB - IIS (hosting this website) - Server 2019
LINDS-Docker - Docker host that runs around 20 containers, which include UniFi controller, UNMS, Monolithic LanCache, PostgreSQL server - Red Hat Enterprise Linux
LINDS-VEEAM - Veeam server, backups all servers except LINDS-DC due to RDM (Raw Device Mapping) being utilised
VCSA - vCenter Server Appliance 6.7