
- Update Forgejo service configuration on grey-area - Refine reverse-proxy network configuration - Add README_new.md with enhanced documentation structure - Update instruction.md with latest workflow guidelines - Enhance plan.md with additional deployment considerations - Complete PR template restructuring for professional tone These changes improve service reliability and documentation clarity while maintaining infrastructure consistency across all machines.
48 KiB
Home Lab Migration Plan
Current
├── machines/ │ ├── congenital-optimist/ (AMD workstation) │ │ ├── default.nix │ │ ├── hardware-configuration.nix │ │ └── About.org │ ├── sleeper-service/ (Intel Xeon E3-1230 V2 file server) │ ├── default.nix │ ├── hardware-configuration.nix │ └── About.org │ ├── reverse-proxy/ (edge/gateway server) │ │ ├── default.nix │ │ ├── hardware-configuration.nix │ │ └── About.org │ └── grey-area/ (application server) │ ├── default.nix │ ├── hardware-configuration.nix │ └── About.orgessment
CongenitalOptimist Machine
- Current NixOS Version: 25.05
- Hardware: AMD CPU/GPU, ZFS storage (zpool + stuffpool), NFS mounts
- Desktop Environments: GNOME, Cosmic, Sway
- Virtualization: libvirt, Incus, Podman
- Configuration Style: Traditional NixOS (non-flakes)
- Dotfiles Approach: Prefer Emacs org-mode with literate programming (no Home Manager)
Current Structure
Home-lab/
├── Machines/
│ ├── CongenitalOptimist/ (existing - AMD workstation)
│ │ ├── configuration.nix
│ │ ├── hardware-configuration.nix
│ │ └── About.org
│ └── Modules/ (existing modular structure)
│ ├── common/
│ │ ├── base.nix (modern CLI tools & aliases)
│ │ └── tty.nix (console styling)
│ └── virtualization/
│ ├── podman.nix
│ ├── libvirt.nix
│ └── incus.nix
└── Users/
└── geir/
└── user.nix (has typo: progtams → programs)
Target Structure (Post-Migration)
Home-lab/
├── flake.nix
├── flake.lock
├── machines/
│ ├── congenital-optimist/ (AMD workstation)
│ │ ├── default.nix
│ │ ├── hardware-configuration.nix
│ │ └── About.org
│ ├── sleeper-service/ (Intel Xeon E3-1230 V2 file server)
│ ├── default.nix
│ ├── hardware-configuration.nix
│ └── About.org
│ ├── reverse-proxy/ (edge/gateway server)
│ │ ├── default.nix
│ │ ├── hardware-configuration.nix
│ │ └── About.org
│ └── grey-area/ (application server)
│ ├── default.nix
│ ├── hardware-configuration.nix
│ └── About.org
├── modules/
│ ├── common/
│ ├── desktop/
│ ├── development/
│ ├── virtualization/
│ ├── services/
│ │ ├── nfs.nix
│ │ ├── samba.nix
│ │ ├── backup.nix
│ │ └── monitoring.nix
│ └── users/
│ └── common.nix (shared user configurations)
├── users/
│ └── geir/
│ ├── dotfiles/
│ │ ├── README.org (geir's literate config)
│ │ ├── emacs/
│ │ ├── shell/
│ │ └── editors/
│ └── user.nix (geir's system config)
├── overlays/
├── packages/
└── secrets/ (for future secrets management)
Deployment Status & Accomplishments ✅
sleeper-service Deployment (COMPLETED)
Date: Recently completed
Status: ✅ Fully operational
Machine: Intel Xeon E3-1230 V2, 16GB RAM (formerly files.home)
Key Achievements:
- Flake Migration: Successfully deployed NixOS flake configuration on remote machine
- ZFS Stability: Resolved ZFS mounting conflicts causing boot failures
- Data Preservation: All 903GB of media data intact and accessible
- Network Integration: Added Pi-hole DNS (10.0.0.14) for package resolution
- SSH Infrastructure: Implemented centralized SSH key management
- Boot Performance: Clean boot in ~1 minute with ZFS auto-mounting enabled
- Remote Deployment: Established rsync + SSH deployment workflow
- NFS Server: Configured NFS exports for both local (10.0.0.0/24) and Tailscale (100.64.0.0/10) networks
- Network Configuration: Updated to use Tailscale IPs for reliable mesh connectivity
Technical Solutions:
- ZFS Native Mounting: Migrated from legacy mountpoints to ZFS native paths
- Hardware Configuration: Removed conflicting ZFS filesystem entries
- Graphics Compatibility: Added
nomodeset
kernel parameter, disabled NVIDIA drivers - DNS Configuration: Multi-tier DNS with Pi-hole primary, router and Google fallback
- Deployment Method: Remote deployment via rsync + SSH instead of direct nixos-rebuild
- NFS Exports: Resolved dataset conflicts by commenting out conflicting tmpfiles rules
- Network Access: Added Tailscale interface (tailscale0) as trusted interface in firewall
Data Verified:
- Storage Pool: 903GB used, 896GB available
- Media Content: Films (184GB), Series (612GB), Audiobooks (94GB), Music (9.1GB), Books (3.5GB)
- Mount Points:
/mnt/storage
and/mnt/storage/media
with proper ZFS auto-mounting - NFS Access: Both datasets exported with proper permissions for network access
grey-area Deployment (COMPLETED) ✅ NEW
Date: June 2025
Status: ✅ Fully operational
Machine: Intel Xeon E5-2670 v3 (24 cores) @ 3.10 GHz, 31.24 GiB RAM
Key Achievements:
- Flake Configuration: Successfully deployed NixOS flake-based configuration
- NFS Client: Configured reliable NFS mount to sleeper-service media storage via Tailscale
- Service Stack: Deployed comprehensive application server with multiple services
- Network Integration: Integrated with centralized extraHosts module using Tailscale IPs
- User Management: Resolved UID conflicts and implemented consistent user configuration
- Firewall Configuration: Properly configured ports for all services
Services Deployed:
- Jellyfin: ✅ Media server with access to NFS-mounted content from sleeper-service
- Calibre-web: ✅ E-book management and reading interface
- Forgejo: ✅ Git hosting server (git.geokkjer.eu) with reverse proxy integration
- Audiobook Server: ✅ Audiobook streaming and management
Technical Implementation:
- NFS Mount:
/mnt/remote/media
successfully mountingsleeper-service:/mnt/storage/media
- Network Path: Using Tailscale mesh (100.x.x.x) for reliable connectivity
- Mount Options: Configured with automount, soft mount, and appropriate timeouts
- Firewall Ports: 22 (SSH), 3000 (Forgejo), 23231 (other services)
- User Configuration: Fixed UID consistency with centralized sma user module
Data Access Verified:
- Movies: 38 films accessible via NFS
- TV Series: 29 series collections
- Music: 9 music directories
- Audiobooks: 79 audiobook collections
- Books: E-book collection
- Media Services: All content accessible through Jellyfin and other services
reverse-proxy Integration (COMPLETED) ✅ NEW
Date: June 2025
Status: ✅ Fully operational
Machine: External VPS (46.226.104.98)
Key Achievements:
- Nginx Configuration: Successfully configured reverse proxy for Forgejo
- Hostname Resolution: Fixed hostname mapping from incorrect "apps" to correct "grey-area"
- SSL/TLS: Configured ACME Let's Encrypt certificate for git.geokkjer.eu
- SSH Forwarding: Configured SSH proxy on port 1337 for Git operations
- Network Security: Implemented DMZ-style security with Tailscale-only SSH access
Technical Configuration:
- HTTP Proxy:
git.geokkjer.eu
→http://grey-area:3000
(Forgejo) - SSH Proxy: Port 1337 →
grey-area:22
for Git SSH operations - Network Path: External traffic → reverse-proxy → Tailscale → grey-area
- Security: SSH restricted to Tailscale network, fail2ban protection
- DNS: Proper hostname resolution via extraHosts module
Centralized Network Configuration (COMPLETED) ✅ NEW
Date: June 2025
Status: ✅ Fully operational
Key Achievements:
- extraHosts Module: Created centralized hostname resolution using Tailscale IPs
- Network Consistency: All machines use same IP mappings for reliable mesh connectivity
- SSH Configuration: Updated IP addresses in ssh-keys.nix module
- User Management: Resolved user configuration conflicts between modules
Network Topology:
- Tailscale Mesh IPs:
100.109.28.53
- congenital-optimist (workstation)100.81.15.84
- sleeper-service (NFS file server)100.119.86.92
- grey-area (application server)100.96.189.104
- reverse-proxy (external VPS)100.103.143.108
- pihole (DNS server)100.126.202.40
- wordpresserver (legacy)
Module Integration:
- extraHosts: Added to all machine configurations for consistent hostname resolution
- SSH Keys: Updated IP addresses (grey-area: 10.0.0.12, reverse-proxy: 46.226.104.98)
- User Modules: Fixed conflicts between sma user definitions in different modules
Home Lab Deployment Tool (COMPLETED) ✅ NEW
Date: Recently completed
Status: ✅ Fully operational
Tool: lab
command - Custom deployment management system
Key Achievements:
- Custom Package Creation: Developed
home-lab-tools.nix
package with comprehensive deployment functionality - System Integration: Added lab command to system packages via
modules/system/applications.nix
- Conflict Resolution: Resolved shell alias conflict by renaming "lab" alias to "home-lab"
- Multi-Machine Support: Deployment capabilities for sleeper-service, grey-area, and reverse-proxy
- Status Monitoring: Infrastructure connectivity checking with color-coded output
- Deployment Modes: Support for boot, test, and switch deployment modes
Technical Implementation:
- Package Structure: Custom Nix package using
writeShellScriptBin
with proper dependencies - Color-Coded Logging: Blue info, green success, yellow warnings, red errors for clear output
- SSH Infrastructure: Leverages existing SSH key management for secure remote deployment
- Rsync Deployment: Efficient configuration syncing to target machines
- Error Handling: Comprehensive error checking and validation throughout deployment process
- Service Detection: Proper Tailscale service monitoring with
tailscaled
detection
Available Commands:
lab status
: Check connectivity to all infrastructure machineslab deploy <machine> [mode]
: Deploy configuration to specific machine- Machines: sleeper-service, grey-area, reverse-proxy
- Modes: boot (default), test (temporary), switch (permanent)
- Help System: Built-in usage documentation and examples
Deployment Workflow:
- Configuration Sync: Uses rsync to transfer entire Home-lab directory to target machine
- Remote Execution: SSH into target machine and execute
nixos-rebuild
with flake - Validation: Checks deployment success and provides clear feedback
- Status Verification: Can verify deployment results with status command
Infrastructure Status Integration:
- Local Machine: Checks Tailscale service status on congenital-optimist
- Remote Machines: SSH connectivity testing with timeout handling
- Network Topology: Integrates with existing Tailscale mesh network
- Service Monitoring: Foundation for future comprehensive monitoring system
Usage Examples:
lab status # Check all machine connectivity
lab deploy sleeper-service boot # Deploy and set for next boot
lab deploy grey-area switch # Deploy and activate immediately
lab deploy reverse-proxy test # Deploy temporarily for testing
Technical Benefits:
- Centralized Deployment: Single command interface for all home lab machines
- Consistent Process: Standardized deployment workflow across infrastructure
- Error Prevention: Validation and safety checks prevent deployment failures
- Operational Visibility: Clear status reporting for infrastructure state
- Extensibility: Modular design allows easy addition of new machines and features
- Integration: Seamless integration with existing SSH and Tailscale infrastructure
Phase 1: Flakes Migration (Priority: High)
1.1 Create Flake Foundation
- Create
flake.nix
at repository root - Define nixpkgs input pinned to NixOS 25.05
- Add nixpkgs-unstable for bleeding edge packages
- Structure outputs for multiple machines (no Home Manager)
- Fix inconsistent naming convention (machine directories to lowercase)
- Update flake outputs to use correct lowercase paths
1.2 Restructure Configuration
- Convert
configuration.nix
to flake-compatible format - Keep
system.stateVersion
as "23.11" (maintains data compatibility) - Update existing module imports for flake structure
- Integrate existing user configuration properly
- Fix nerd-fonts syntax for 25.05 compatibility
- Fix hostname typo (congenial-optimist → congenital-optimist)
1.3 Consolidate User Configuration
- Fix typo in
users/geir/user.nix
(progtams → programs) - Already correct - Merge duplicate user packages between main config and user module
- Decide on package location strategy (system vs user level)
- Ensure all existing functionality is preserved
1.4 Configuration Testing & Validation
- Validate flake syntax with
nix flake check
- Test build without switching:
nixos-rebuild build --flake
- Test configuration:
nixos-rebuild test --flake
- Successfully tested modularized configuration with virtualization
1.5 Desktop Environment Modularization ✅ NEW
- Split monolithic
environments.nix
into modular components:common.nix
- Shared desktop configuration (XDG portal, dbus)gnome.nix
- GNOME desktop environment with extensionscosmic.nix
- System76 Cosmic desktop environmentsway.nix
- Sway window manager with Wayland tools
- Update main configuration to use modular desktop imports
- Test modular desktop configuration successfully
1.6 Virtualization Stack ✅ NEW
- Add comprehensive virtualization support:
- Incus - Modern container and VM management (replaces LXD)
- Libvirt/QEMU - Full KVM virtualization with virt-manager
- Podman - Rootless containers with Docker compatibility
- Configure proper user groups (incus-admin, libvirt, podman)
- Enable UEFI/OVMF support for modern VM guests
- Test all virtualization services running successfully
- Create rollback plan and ZFS snapshots
- Switch to flake configuration permanently
1.7 GitOps Foundation & CI/CD Setup ✅ NEW
- Initialize git repository for infrastructure as code
- Create comprehensive
.gitignore
for NixOS/Nix projects - Set up initial commit with current modular configuration
- Plan CI/CD pipeline for configuration validation
- Design branch strategy for infrastructure changes
- Create templates for pull request workflows
- Plan automated testing for configuration changes
- Set up secrets management strategy for CI/CD
- Document GitOps workflow for multi-machine deployments
1.8 Additional Migration Tasks
- Update all documentation files to use consistent naming
- Update flake descriptions and comments for clarity
- Verify all module imports work correctly in new structure
- Modularize congenital-optimist configuration into logical modules
- Clean up any remaining references to old PascalCase paths
- Test that existing aliases and CLI tools still work
- Verify desktop environments (GNOME, Cosmic, Sway) all function
- Test virtualization stack (podman, libvirt, incus) functionality
- Validate ZFS and storage configuration compatibility
- Generate and commit flake.lock file
- Create backup of current working configuration before final switch
Phase 2: Configuration Cleanup & Organization
2.1 Optimize Current Modular Structure
- Review and optimize existing
common/base.nix
tools - Enhance
common/tty.nix
console configuration - Validate virtualization modules are complete
- Create desktop environment modules (separate GNOME, Cosmic, Sway)
- Separate development tools into dedicated module
2.2 Target Directory Structure
Home-lab/
├── flake.nix
├── flake.lock
├── machines/
│ ├── congenital-optimist/ (AMD workstation)
│ │ ├── default.nix (main machine config)
│ │ ├── hardware-configuration.nix
│ │ └── About.org
│ ├── sleeper-service/ (Intel Xeon file server)
│ │ ├── default.nix (file server config)
│ │ ├── hardware-configuration.nix
│ │ └── About.org
│ ├── reverse-proxy/ (edge/gateway server)
│ │ ├── default.nix
│ │ ├── hardware-configuration.nix
│ │ └── About.org
│ └── grey-area/ (application server)
│ ├── default.nix
│ ├── hardware-configuration.nix
│ └── About.org
├── modules/
│ ├── common/
│ │ ├── base.nix (existing modern CLI tools)
│ │ ├── tty.nix (existing console config)
│ │ └── nix.nix (flakes + experimental features)
│ ├── desktop/
│ │ ├── gnome.nix
│ │ ├── cosmic.nix
│ │ └── sway.nix
│ ├── development/
│ │ ├── editors.nix (emacs, neovim, vscode, etc.)
│ │ ├── languages.nix (rust, python, LSPs)
│ │ └── tools.nix
│ ├── virtualization/ (existing)
│ │ ├── podman.nix
│ │ ├── libvirt.nix
│ │ └── incus.nix
│ ├── services/ (for SleeperService + grey-area)
│ │ ├── nfs.nix (network file sharing)
│ │ ├── samba.nix (windows compatibility)
│ │ ├── backup.nix (automated backups)
│ │ ├── monitoring.nix (system monitoring)
│ │ ├── storage.nix (ZFS/RAID management)
│ │ ├── reverse-proxy.nix (nginx/traefik configuration)
│ │ ├── forgejo.nix (git hosting and CI/CD)
│ │ ├── media.nix (jellyfin configuration)
│ │ └── applications.nix (containerized services)
│ └── users/
│ └── common.nix (shared user configurations)
├── users/
│ └── geir/
│ ├── dotfiles/
│ │ ├── README.org (main literate config)
│ │ ├── emacs/
│ │ ├── shell/
│ │ └── editors/
│ └── user.nix (consolidated user config)
├── overlays/
├── packages/
└── secrets/ (for future secrets management)
2.3 Network Infrastructure Updates
- Network topology discovery: Used nmap to map actual network layout
- Network Range:
10.0.0.0/24
(not 192.168.1.x as initially assumed) - Gateway:
10.0.0.138
(lan.home - router/firewall) - DNS Server:
10.0.0.14
(pi.hole - Pi-hole ad-blocker) - Current File Server:
10.0.0.8
(files.home - will be renamed to sleeper-service) - Machine Migration: sleeper-service is the existing files.home machine, not a new deployment
- Network Range:
- sleeper-service systemd-networkd migration: ✅ COMPLETED and DEPLOYED
- Hostname transition: Successfully renamed from files.home to sleeper-service
- Static IP preserved: Maintained 10.0.0.8/24 with gateway 10.0.0.138
- DNS integration: Pi-hole primary (10.0.0.14), router fallback (10.0.0.138), Google DNS (8.8.8.8)
- Network stack:
networking.useNetworkd = true
withnetworking.useDHCP = false
- Interface configuration:
enp0s25
configured with declarative static IPv4 - Service ports: File server ports configured (NFS: 111,2049; SMB: 139,445; NetBIOS: 137,138)
- Production validation: Network configuration tested and operational
- Network standardization: Plan consistent networkd configuration across all server role machines workstation and laptop can use networkmanager
- IP address allocation: Document static IP assignments for each service
- Local Network (10.0.0.0/24):
- 10.0.0.2: arlaptop.home (existing laptop)
- 10.0.0.3: congenital-optimist (AMD workstation - current machine)
- 10.0.0.8: sleeper-service (Intel Xeon file server - rename from files.home)
- 10.0.0.11: grey-area (planned application server)
- 10.0.0.12: reverse-proxy (planned edge server)
- 10.0.0.14: pi.hole (Pi-hole DNS/ad-blocker) maybe move to nixos
- 10.0.0.90: wordpresserver.home (existing WordPress server) to be deleted, incus container
- 10.0.0.117: webdev.home (existing web development server) to be deleted, incus container
- 10.0.0.138: lan.home (router/gateway/dhcp)
- Tailscale Network (100.x.x.x/10):
- 100.109.28.53: congenital-optimist (current machine)
- 100.119.86.92: apps (active server) (rename to grey area)
- 100.114.185.71: arlaptop (laptop) (Arch Linux with plans to migrate to NixOS)
- 100.81.15.84: files (file server rename to sleeper-service )
- 100.103.143.108: pihole (DNS server)
- 100.96.189.104: vps1 (external VPS) (rename to reverse proxy)
- 100.126.202.40: wordpresserver (WordPress) to be deleted
- remind user to update tailsce or find a way to use the cli to do this
- Local Network (10.0.0.0/24):
- VLAN planning: Consider network segmentation for different service types
- DNS configuration: Plan local DNS resolution for internal services
Phase 3: System Upgrade & Validation
3.1 Pre-upgrade Preparation
- Backup current system configuration
- Document current package versions
- Create ZFS snapshots of all datasets
- Test flake build without switching
- Verify all existing modules work in flake context
3.2 Upgrade Execution
- Switch to flake-based configuration
- Upgrade to NixOS 25.05
- Validate all services start correctly
- Test desktop environments functionality
- Verify virtualization stack
- Check user environment and packages
3.3 Post-upgrade Validation
- Verify all applications launch
- Test development tools (editors, LSPs, compilers)
- Validate container and VM functionality
- Check ZFS and NFS mount operations
- Verify shell environment and modern CLI tools work
- Test console theming and TTY setup
Phase 4: Dotfiles & Configuration Management
4.1 GNU Stow Infrastructure for Regular Dotfiles ✅ DECIDED
Approach: Use GNU Stow for traditional dotfiles, literate programming for Emacs only
GNU Stow Setup
- Create
~/dotfiles/
directory structure with package-based organization - Set up core packages:
zsh/
,git/
,tmux/
,starship/
, etc. - Configure selective deployment per machine (workstation vs servers)
- Create stow deployment scripts for different machine profiles
- Document stow workflow and package management
Package Structure
~/dotfiles/ # Stow directory (target: $HOME)
├── zsh/ # Shell configuration
│ ├── .zshrc
│ ├── .zshenv
│ └── .config/zsh/
├── git/ # Git configuration
│ ├── .gitconfig
│ └── .config/git/
├── starship/ # Prompt configuration
│ └── .config/starship.toml
├── tmux/ # Terminal multiplexer
│ └── .tmux.conf
├── emacs/ # Basic Emacs bootstrap (points to literate config)
│ └── .emacs.d/early-init.el
└── machine-specific/ # Per-machine configurations
├── workstation/
└── server/
4.2 Literate Programming for Emacs Configuration ✅ DECIDED
Approach: Comprehensive org-mode literate configuration for Emacs only
Emacs Literate Setup
- Create
~/dotfiles/emacs/.emacs.d/configuration.org
as master config - Set up automatic tangling on save (org-babel-tangle-on-save)
- Modular org sections: packages, themes, keybindings, workflows
- Bootstrap early-init.el to load tangled configuration
- Create machine-specific customizations within org structure
Literate Configuration Structure
~/dotfiles/emacs/.emacs.d/
├── early-init.el # Bootstrap (generated by Stow)
├── configuration.org # Master literate config
├── init.el # Tangled from configuration.org
├── modules/ # Tangled module files
│ ├── base.el
│ ├── development.el
│ ├── org-mode.el
│ └── ui.el
└── machine-config/ # Machine-specific overrides
├── workstation.el
└── server.el
4.3 Integration Strategy
- System-level: NixOS modules provide system packages and environment
- User-level: GNU Stow manages dotfiles and application configurations
- Emacs-specific: Org-mode literate programming for comprehensive Emacs setup
- Per-machine: Selective stow packages + machine-specific customizations
- Version control: Git repository for dotfiles with separate org documentation
4.4 Deployment Workflow
- Create deployment scripts for different machine types:
- Workstation: Full package deployment (zsh, git, tmux, starship, emacs)
- Server: Minimal package deployment (zsh, git, basic emacs)
- Development: Additional packages (language-specific tools, IDE configs)
- Integration with existing
lab
deployment tool - Documentation for new user onboarding across machines
Phase 5: Home Lab Expansion Planning
5.1 Infrastructure Additions
Naming Convention
-
Machine Names: Culture ship names in PascalCase (e.g.,
CongenitalOptimist
,SleeperService
) -
Folder Names: lowercase-with-hyphens (e.g.,
congenital-optimist/
,sleeper-service/
) -
Flake Outputs: lowercase-with-hyphens (e.g.,
nixosConfigurations.congenital-optimist
) -
Hostnames: lowercase-with-hyphens (e.g.,
congenital-optimist
,sleeper-service
) -
User Names: Culture character names in lowercase (e.g.,
sma
,geir
) -
SleeperService file server (Intel Xeon E3-1230 V2, 16GB RAM): ✅ COMPLETED
- NFS server for network storage (903GB ZFS pool operational)
- ZFS storage with native mounting configuration
- Flake-based NixOS deployment successful
- SSH key management implemented
- Network configuration with Pi-hole DNS integration
- System boots cleanly in ~1 minute with ZFS auto-mounting
- Data preservation verified (Films: 184GB, Series: 612GB, etc.)
- NFS exports configured for both local and Tailscale networks
- Resolved dataset conflicts and tmpfiles rule conflicts
- Automated backup services (future enhancement)
- System monitoring and alerting (future enhancement)
-
reverse-proxy edge server: ✅ COMPLETED
- Nginx reverse proxy with proper hostname mapping (grey-area vs apps)
- SSL/TLS termination with Let's Encrypt for git.geokkjer.eu
- External access gateway with DMZ security configuration
- SSH forwarding on port 1337 for Git operations
- Fail2ban protection and Tailscale-only SSH access
- Minimal attack surface, headless operation
-
grey-area application server (Culture GCU - versatile, multi-purpose): ✅ COMPLETED
- Primary: Forgejo Git hosting (git.geokkjer.eu) with reverse proxy integration
- Secondary: Jellyfin media server with NFS-mounted content
- Additional: Calibre-web e-book server and audiobook streaming
- Infrastructure: Container-focused (Podman), NFS client for media storage
- Integration: Central Git hosting accessible externally via reverse proxy
- Network: Integrated with Tailscale mesh and centralized hostname resolution
- User Management: Resolved UID conflicts with centralized sma user configuration
- Monitoring: TBD (future enhancement)
- PostgreSQL: Plan database services for applications requiring persistent storage
-
Plan for additional users across machines:
- geir - Primary user (development, desktop, daily use)
- sma - Admin user (Diziet Sma, system administration, security oversight)
- Service accounts for automation (forgejo-admin, backup-agent)
- Guest accounts for temporary access
- Culture character naming convention established
-
Network infrastructure planning: Started with sleeper-service systemd-networkd migration
-
Consider hardware requirements for future expansion
5.2 Services Architecture
- Centralized configuration management
- Per-user secrets management (agenix/sops-nix)
- User-specific service configurations
- Monitoring and logging (Prometheus, Grafana)
- Backup strategy across machines and users
- Container orchestration planning
5.3 Security & Networking
- systemd-networkd migration: Completed for sleeper-service with static IP configuration
- SSH key management centralization: ✅ IMPLEMENTED and DEPLOYED
- Admin key (
geir@geokkjer.eu-admin
): For sma user, server administration access - Development key (
geir@geokkjer.eu-dev
): For geir user, git services, daily development - NixOS module:
modules/security/ssh-keys.nix
centralizes key management - SSH client config: Updated with role-based host patterns and key selection
- Production deployment: Successfully deployed on sleeper-service
- Security benefits: Principle of least privilege, limited blast radius if compromised
- Usage examples:
ssh geir@sleeper-service.home
- Uses dev key automaticallyssh admin-sleeper
- Uses admin key for sma user accessgit clone git@github.com:user/repo
- Uses dev key for git operations
- Admin key (
- VPN configuration (Tailscale expansion)
- Firewall rules standardization across machines
- Certificate management (Let's Encrypt)
- Network segmentation planning (VLANs for services vs. user devices)
- DNS infrastructure (local DNS server for service discovery)
Phase 6: Advanced Features
6.1 Development Workflow
- Devshells for different projects
- Cachix setup for faster builds
- CI/CD integration
- Literate dotfiles with org-mode tangling automation
6.2 Automation & Maintenance
- AI integration - development of a mcp server for the cluster
- Automated system updates
- Configuration validation tests
- Deployment automation
- Monitoring and alerting
6.3 Advanced Deployment Strategies ✅ RESEARCH COMPLETED
Deploy-rs Migration (Priority: High) 📋 RESEARCHED
-
Research deploy-rs capabilities ✅ COMPLETED
- Rust-based deployment tool specifically designed for NixOS flakes
- Features: parallel deployment, automatic rollback, health checks, SSH-based
- Advanced capabilities: atomic deployments, magic rollback on failure
- Profile management: system, user, and custom profiles support
- Integration potential: Works with existing SSH keys and Tailscale network
-
Migration Planning: Transition from custom
lab
script to deploy-rs- Create deploy-rs configuration in flake.nix for all 4 machines
- Configure nodes: sleeper-service, grey-area, reverse-proxy, congenital-optimist
- Set up health checks for critical services (NFS, Forgejo, Jellyfin, nginx)
- Test parallel deployment capabilities across infrastructure
- Implement automatic rollback for failed deployments
- Document migration benefits and new deployment workflow
Deploy-rs Configuration Structure
# flake.nix additions
deploy.nodes = {
sleeper-service = {
hostname = "100.81.15.84"; # Tailscale IP
profiles.system.path = deploy-rs.lib.x86_64-linux.activate.nixos
self.nixosConfigurations.sleeper-service;
profiles.system.user = "root";
};
grey-area = {
hostname = "100.119.86.92";
profiles.system.path = deploy-rs.lib.x86_64-linux.activate.nixos
self.nixosConfigurations.grey-area;
# Health checks for Forgejo, Jellyfin services
};
reverse-proxy = {
hostname = "100.96.189.104";
profiles.system.path = deploy-rs.lib.x86_64-linux.activate.nixos
self.nixosConfigurations.reverse-proxy;
# Health checks for nginx, SSL certificates
};
};
Migration Benefits
- Atomic deployments: Complete success or automatic rollback
- Parallel deployment: Deploy to multiple machines simultaneously
- Health checks: Validate services after deployment
- Connection resilience: Better handling of SSH/network issues
- Flake-native: Designed specifically for NixOS flake workflows
- Safety: Magic rollback prevents broken deployments
Alternative: Guile Scheme Exploration (Priority: Low)
- Research Guile Scheme for system administration
- Evaluate functional deployment scripting patterns
- Compare with current shell script and deploy-rs approaches
- Consider integration with GNU Guix deployment patterns
- Assess learning curve vs. practical benefits for home lab use case
6.4 Writeup
- Take all the knowledge we have amassed and make a blog post or a series of blog posts
Phase 7: goin pro
- A plan to generalise this project so it is usable for other people
- A plan to make dashboard and web interface for the project
Timeline Estimates
- Phase 1: 1-2 weeks (flakes migration)
- Phase 2: 1 week (cleanup and organization)
- Phase 3: 2-3 days (upgrade and validation)
- Phase 4: 1 week (literate dotfiles setup)
- Phase 5: 2-4 weeks (expansion planning and implementation)
- Phase 6: Ongoing (advanced features as needed)
Risk Mitigation
Critical Risks
- Boot failure after upgrade: ZFS snapshots for quick rollback
- Desktop environment issues: Keep multiple DEs as fallback
- Virtualization breakage: Document current VM configurations
- Data loss: Multiple backup layers (ZFS, external)
- User environment regression: Backup existing dotfiles
Rollback Strategy
- ZFS snapshot rollback capability
- Keep old configuration.nix as reference
- Maintain emergency boot media
- Document manual recovery procedures
- Preserve current user configuration during migration
Current Status Overview (Updated December 2024)
Infrastructure Deployment Status ✅ MAJOR MILESTONE ACHIEVED
✅ PHASE 1: Flakes Migration - COMPLETED
✅ PHASE 2: Configuration Cleanup - COMPLETED
✅ PHASE 3: System Upgrade & Validation - COMPLETED
✅ PHASE 5: Home Lab Expansion - 4/4 MACHINES FULLY OPERATIONAL 🎉
Machine Status
- ✅ congenital-optimist: Development workstation (fully operational)
- ✅ sleeper-service: NFS file server with 903GB media library (fully operational)
- ✅ grey-area: Application server with Forgejo, Jellyfin, Calibre-web, audiobook server (fully operational)
- ✅ reverse-proxy: External gateway with nginx, SSL termination, SSH forwarding (fully operational)
Network Architecture Status
- ✅ Tailscale Mesh: All machines connected via secure mesh network (100.x.x.x addresses)
- ✅ Hostname Resolution: Centralized extraHosts module deployed across all machines
- ✅ NFS Storage: Reliable media storage access via Tailscale network (sleeper-service → grey-area)
- ✅ External Access: Public services accessible via git.geokkjer.eu with SSL
- ✅ SSH Infrastructure: Centralized key management with role-based access patterns
- ✅ Firewall Configuration: Service ports properly configured across all machines
Services Status - FULLY OPERATIONAL STACK 🚀
- ✅ Git Hosting: Forgejo operational at git.geokkjer.eu with SSH access on port 1337
- ✅ Media Streaming: Jellyfin with NFS-mounted content library (38 movies, 29 TV series)
- ✅ E-book Management: Calibre-web for book collections
- ✅ Audiobook Streaming: Audiobook server with 79 audiobook collections
- ✅ File Storage: NFS server with 903GB media library accessible across network
- ✅ Web Gateway: Nginx reverse proxy with Let's Encrypt SSL and proper hostname mapping
- ✅ User Management: Consistent UID/GID configuration across machines (sma user: 1001/992)
Infrastructure Achievements - COMPREHENSIVE DEPLOYMENT ✅
- ✅ NFS Mount Resolution: Fixed grey-area
/mnt/storage
→/mnt/storage/media
dataset access - ✅ Network Exports: Updated sleeper-service NFS exports for Tailscale network (100.64.0.0/10)
- ✅ Service Discovery: Corrected reverse-proxy hostname mapping from "apps" to "grey-area"
- ✅ Firewall Management: Added port 3000 for Forgejo service accessibility
- ✅ SSH Forwarding: Configured SSH proxy on port 1337 for Git operations
- ✅ SSL Termination: Let's Encrypt certificates working for git.geokkjer.eu
- ✅ Data Verification: All media content accessible (movies, TV, music, audiobooks, books)
- ✅ Deployment Tools: Custom
lab
command operational for infrastructure management
Current Operational Status
🟢 ALL CORE INFRASTRUCTURE DEPLOYED AND OPERATIONAL
- 4/4 machines deployed with full service stack
- External access verified:
curl -I https://git.geokkjer.eu
returns HTTP/2 200 - NFS connectivity confirmed: Media files accessible across network via Tailscale
- Service integration complete: Forgejo, Jellyfin, Calibre-web, audiobook server running
- Network mesh stable: All machines connected via Tailscale with centralized hostname resolution
Next Phase Priorities
- PHASE 4: GNU Stow + Literate Emacs Setup
- Set up GNU Stow infrastructure for regular dotfiles (zsh, git, tmux, starship)
- Create comprehensive Emacs literate configuration with org-mode
- Implement selective deployment per machine type (workstation vs server)
- Integration with existing NixOS system-level configuration
- PHASE 6: Advanced Features & Deploy-rs Migration
- Migrate from custom
lab
script to deploy-rs for improved deployment - Implement system monitoring and alerting infrastructure
- Set up automated backup services for critical data
- Create health checks and deployment validation
- Migrate from custom
- Documentation & Knowledge Sharing
- Comprehensive blog post series documenting the full home lab journey
- User guides for GNU Stow + literate Emacs configuration workflow
- Deploy-rs migration guide and lessons learned
- SSH & Network Infrastructure Improvements: Combined priority for related infrastructure upgrades
- SSH connection testing with original ed25519 key (already approved in Forgejo)
- Consider testing direct connection to forgejo@grey-area first to bypass proxy
- SSH debugging and key management refinement
- Migration from nginx streams to HAProxy for better SSH forwarding and load balancing
- Gradual re-hardening of SSH security (Tailscale-only access) after Git verification
- Deploy-rs migration for improved deployment automation and health checks
- Future Enhancements
- User ID consistency cleanup (sma user UID alignment across machines)
- CI/CD integration with Forgejo for automated testing and deployment
Success Criteria
Core Infrastructure ✅ FULLY ACHIEVED 🎉
- System boots reliably with flake configuration
- All current functionality preserved
- NixOS 25.05 running stable across all machines
- Configuration is modular and maintainable
- User environment fully functional with all packages
- Modern CLI tools and aliases working
- Console theming preserved
- Virtualization stack operational
- Multi-machine expansion completed (4/4 machines deployed)
- Development workflow improved with Git hosting
Service Architecture ✅ FULLY ACHIEVED 🚀
- NFS file server operational with reliable network access via Tailscale
- Git hosting with external access via reverse proxy (git.geokkjer.eu)
- Media services with shared storage backend (Jellyfin + 903GB library)
- E-book and audiobook management services operational
- Secure external access with SSL termination and SSH forwarding
- Network mesh connectivity with centralized hostname resolution
- All services verified operational and accessible externally
Network Integration ✅ FULLY ACHIEVED 🌐
- Tailscale mesh network connecting all infrastructure machines
- Centralized hostname resolution via extraHosts module
- NFS file sharing working reliably over network
- SSH key management with role-based access patterns
- Firewall configuration properly securing all services
- External domain (git.geokkjer.eu) with SSL certificates working
Outstanding Enhancement Goals 🔄
- Literate dotfiles workflow established with org-mode
- Documentation complete for future reference and blog writeup
- System monitoring and alerting infrastructure (Prometheus/Grafana)
- Automated deployment and maintenance improvements
- Automated backup services for critical data
- User ID consistency cleanup across machines
Infrastructure Notes
CongenitalOptimist (AMD Workstation)
- Already has excellent modular structure
- Modern CLI tools (eza, bat, ripgrep, etc.) already configured in base.nix
- Console theming with Joker palette already implemented
- User configuration needs cleanup (fix typo, consolidate packages)
- ZFS configuration is solid and shouldn't need changes
- Keep Tailscale configuration as network foundation
- The AMD GPU setup should carry over cleanly to 25.05
- Consider renaming hostname from "work" to "congenital-optimist" for consistency
SleeperService (Intel Xeon File Server)
- Intel Xeon E3-1230 V2 @ 3.70GHz (4 cores, 8 threads)
- 16GB RAM - adequate for file server operations
- Perfect for reliable, background file serving tasks
- Culture name fits: "massive GSV with reputation for taking unusual tasks"
- Will handle NFS mounts currently served by external "files" server
- Plan for ZFS or software RAID for data redundancy
- Headless operation - no desktop environments needed
- SSH-only access with robust monitoring
reverse-proxy (Edge Server)
- Lightweight hardware requirements (can be modest specs)
- Primary role: SSL/TLS termination and traffic routing
- External-facing server with minimal attack surface
- Nginx or Traefik for reverse proxy functionality
- Let's Encrypt integration for automated certificate management
- Fail2ban and security hardening
- Routes traffic to internal services (grey-area, sleeper-service)
grey-area (Application Server - Culture GCU)
- Hardware: Intel Xeon E5-2670 v3 (24 cores) @ 3.10 GHz, 31.24 GiB RAM
- Primary Mission: Forgejo Git hosting and project management
- Performance: Excellent specs for heavy containerized workloads and CI/CD
- Container-focused architecture using Podman
- PostgreSQL database for Forgejo
- Concurrent multi-service deployment capability
- Secondary services: Jellyfin (with transcoding), Nextcloud, Grafana
- Integration hub for all home lab development projects
- Culture name fits: "versatile ship handling varied, ambiguous tasks"
- Central point for CI/CD pipelines and automation
Home Lab Philosophy
- Emacs org-mode literate programming approach provides better control than Home Manager
- Culture ship names create memorable, characterful infrastructure
- Modular NixOS configuration allows easy machine additions
- Per-user dotfiles structure scales across multiple machines
- Tailscale provides secure network foundation for multi-machine setup
Recent Critical Issue Resolution (December 2024) 🔧
NFS Mount and Service Integration Issues - RESOLVED
-
NFS Dataset Structure Resolution:
- Problem: grey-area couldn't access media files via NFS mount
- Root Cause: ZFS dataset structure confusion - mounting
/mnt/storage
vs/mnt/storage/media
- Solution: Updated grey-area NFS mount from
sleeper-service:/mnt/storage
tosleeper-service:/mnt/storage/media
- Result: All media content now accessible (38 movies, 29 TV series, 9 music albums, 79 audiobooks)
-
NFS Network Export Configuration:
- Problem: NFS exports only configured for local network (10.0.0.0/24)
- Root Cause: Missing Tailscale network access in NFS exports
- Solution: Updated sleeper-service NFS exports to include Tailscale network (100.64.0.0/10)
- Result: Reliable NFS connectivity over Tailscale mesh network
-
Conflicting tmpfiles Rules:
- Problem: systemd tmpfiles creating conflicting directory structures for NFS exports
- Root Cause: tmpfiles.d rules interfering with ZFS dataset mounting
- Solution: Commented out conflicting tmpfiles rules in sleeper-service configuration
- Result: Clean NFS export structure without mounting conflicts
-
Forgejo Service Accessibility:
- Problem: git.geokkjer.eu returning connection refused errors
- Root Cause: Multiple issues - firewall ports, hostname mapping, SSH forwarding
- Solutions Applied:
- Added port 3000 to grey-area firewall configuration
- Fixed reverse-proxy nginx configuration:
http://apps:3000
→http://grey-area:3000
- Updated SSH forwarding:
apps:22
→grey-area:22
for port 1337
- Result: External access verified -
curl -I https://git.geokkjer.eu
returns HTTP/2 200
-
Hostname Resolution Consistency:
- Problem: Inconsistent hostname references across configurations ("apps" vs "grey-area")
- Root Cause: Legacy hostname references in reverse-proxy configuration
- Solution: Updated all configurations to use consistent "grey-area" hostname
- Result: Proper service discovery and reverse proxy routing
-
User ID Consistency Challenge:
- Current State: sma user has UID 1003 on grey-area vs 1001 on sleeper-service
- Workaround: NFS access working via group permissions (users group: GID 100)
- Future Fix: Implement centralized UID management across all machines
Recent Troubleshooting & Solutions (June 2025):
- NFS Dataset Structure: Proper understanding of ZFS dataset hierarchy crucial for NFS exports
/mnt/storage
vs/mnt/storage/media
dataset mounting differences- NFS exports must match actual ZFS dataset structure, not subdirectories
- Client mount paths must align with server export paths for data access
- Network Transition Management: Tailscale vs local network connectivity during deployment
- NFS exports need both local (10.0.0.0/24) and Tailscale (100.64.0.0/10) network access
- extraHosts module provides consistent hostname resolution across network changes
- Firewall configuration must accommodate service ports for external access
- Reverse Proxy Configuration: Hostname consistency critical for proxy functionality
- nginx upstream configuration must use correct hostnames (grey-area not apps)
- Service discovery relies on centralized hostname resolution modules
- SSL certificate management works seamlessly with proper nginx configuration
- Service Integration: Multi-machine service architecture requires coordinated configuration
- Forgejo deployment spans grey-area (service) + reverse-proxy (gateway) + DNS (domain)
- NFS client/server coordination requires matching export/mount configurations
- User ID consistency across machines essential for NFS file access permissions
- Firewall Management: Service-specific port configuration essential for functionality
- Application servers need service ports opened (3000 for Forgejo, etc.)
- Reverse proxy needs external ports (80, 443, 1337) and internal connectivity
- SSH access coordination between local and Tailscale networks for security