.. | ||
config | ||
core | ||
deployment | ||
lab | ||
mcp | ||
research | ||
utils | ||
default.nix | ||
flake.lock | ||
flake.nix | ||
main.scm | ||
README.md | ||
REFACTORING_SUMMARY.md | ||
test-functionality.scm | ||
test-implementation.scm | ||
test-modular.scm | ||
TODO.md |
Lab Package - Home Lab Infrastructure Management in Guile Scheme
A comprehensive home lab management tool implemented in GNU Guile Scheme, providing infrastructure monitoring, deployment automation, and health checking capabilities for NixOS-based systems.
Table of Contents
- Overview
- Background on GNU Guile
- Code Architecture
- Core Modules
- Installation & Setup
- Usage Examples
- API Reference
- Development
- Integration
Overview
The Lab package is a sophisticated infrastructure management tool designed for home lab environments running NixOS. It provides:
- Infrastructure Status Monitoring: Real-time status checking across multiple machines
- Deployment Automation: Safe NixOS configuration deployment with multiple strategies
- Health Checking: Comprehensive system health validation
- SSH-based Operations: Secure remote command execution and file operations
- Error Handling: Robust error reporting and recovery mechanisms
- MCP Integration: Model Context Protocol server for IDE integration
Background on GNU Guile
What is GNU Guile?
GNU Guile is the official extension language for the GNU Project and is an implementation of the Scheme programming language. Scheme is a minimalist dialect of Lisp, designed for clarity and elegance.
Why Guile for Infrastructure Management?
Advantages over traditional shell scripting:
- Rich Data Structures: Native support for complex data manipulation with lists, association lists, and records
- Error Handling: Sophisticated condition system for graceful error recovery
- Modularity: Built-in module system for organizing large codebases
- Functional Programming: Immutable data structures and pure functions reduce bugs
- REPL-Driven Development: Interactive development and debugging capabilities
- Extensibility: Easy integration with C libraries and external tools
Scheme Language Features:
- S-expressions: Code as data, enabling powerful metaprogramming
- First-class functions: Functions can be passed as arguments and returned as values
- Pattern matching: Elegant control flow with the
match
macro - Tail call optimization: Efficient recursive algorithms
- Hygienic macros: Safe code generation and transformation
Example of Scheme expressiveness:
;; Traditional imperative approach (pseudo-code)
machines = get_machines()
results = []
for machine in machines:
status = check_machine(machine)
results.append((machine, status))
;; Functional Scheme approach
(map (lambda (machine)
`(,machine . ,(check-machine machine)))
(get-machines))
Code Architecture
The lab package follows a modular architecture with clear separation of concerns:
packages/lab/
├── core.scm # Core infrastructure operations
├── deployment.scm # Deployment strategies and execution
├── machines.scm # Machine management and discovery
├── monitoring.scm # Health checking and status monitoring
└── utils/
├── config.scm # Configuration management
├── logging.scm # Structured logging system
└── ssh.scm # SSH operations and connectivity
Design Principles
- Functional Core, Imperative Shell: Pure functions for business logic, side effects at boundaries
- Data-Driven Design: Configuration and machine definitions as data structures
- Composable Operations: Small, focused functions that can be combined
- Error Boundaries: Comprehensive error handling with informative messages
- Testable Components: Functions designed for easy unit testing
Core Modules
core.scm - Main Infrastructure Operations
The core module provides the primary interface for infrastructure management:
(define-module (lab core)
#:export (get-infrastructure-status
check-system-health
update-flake
validate-environment
execute-nixos-rebuild))
Key Functions
get-infrastructure-status
- Retrieves comprehensive status of all machines or a specific machine
- Returns structured data including connectivity, services, and system metrics
- Supports both local and remote machine checking
check-system-health
- Performs comprehensive health checks including:
- SSH connectivity testing
- Disk space validation (< 90% usage)
- System load monitoring (< 5.0 load average)
- Critical service status (sshd, etc.)
- Network connectivity verification
execute-nixos-rebuild
- Safe NixOS deployment with comprehensive error handling
- Supports multiple deployment modes (switch, boot, test)
- Handles both local and remote deployments
- Includes dry-run capabilities for testing
Code Analysis Example
;; Infrastructure status checking with error handling
(define (get-infrastructure-status . args)
"Get status of all machines or specific machine if provided"
(let ((target-machine (if (null? args) #f (car args)))
(machines (if (null? args)
(get-all-machines)
(list (car args)))))
(log-info "Checking infrastructure status...")
(map (lambda (machine-name)
(let ((start-time (current-time)))
(log-debug "Checking ~a..." machine-name)
;; Gather machine information with error isolation
(let* ((ssh-config (get-ssh-config machine-name))
(is-local (and ssh-config (assoc-ref ssh-config 'is-local)))
(connection-status (test-ssh-connection machine-name))
(services-status (if connection-status
(get-machine-services-status machine-name)
'()))
(system-info (if connection-status
(get-machine-system-info machine-name)
#f))
(elapsed (- (current-time) start-time)))
;; Return structured status data
`((machine . ,machine-name)
(type . ,(if is-local 'local 'remote))
(connection . ,(if connection-status 'online 'offline))
(services . ,services-status)
(system . ,system-info)
(check-time . ,elapsed)))))
machines)))
This code demonstrates several Scheme/Guile idioms:
- Optional Arguments: Using
(args)
withnull?
check for flexible function signatures - Let Bindings: Structured variable binding with
let*
for dependent calculations - Conditional Expressions: Using
if
expressions for control flow - Association Lists: Structured data representation with
assoc-ref
- Quasiquote/Unquote: Building structured data with ``
((key . ,value))
syntax
utils/ssh.scm - SSH Operations
Handles all SSH-related operations with comprehensive error handling:
(define (test-ssh-connection machine-name)
(let ((ssh-config (get-ssh-config machine-name)))
(if (not ssh-config)
(begin
(log-error "No SSH configuration found for ~a" machine-name)
#f)
(if (assoc-ref ssh-config 'is-local)
(begin
(log-debug "Machine ~a is local, skipping SSH test" machine-name)
#t)
;; Remote SSH testing logic
(let ((hostname (assoc-ref ssh-config 'hostname))
(ssh-alias (assoc-ref ssh-config 'ssh-alias)))
(catch #t
(lambda ()
(let* ((test-cmd (if ssh-alias
(format #f "ssh -o ConnectTimeout=5 -o BatchMode=yes ~a echo OK" ssh-alias)
(format #f "ssh -o ConnectTimeout=5 -o BatchMode=yes ~a echo OK" hostname)))
(port (open-pipe* OPEN_READ "/bin/sh" "-c" test-cmd))
(output (get-string-all port))
(status (close-pipe port)))
(zero? status)))
(lambda (key . args)
(log-warn "SSH connection test failed for ~a: ~a" machine-name key)
#f)))))))
utils/config.scm - Configuration Management
Provides structured configuration handling:
(define default-config
`((homelab-root . "/home/geir/Home-lab")
(machines . ((congenital-optimist
(type . local)
(hostname . "localhost")
(services . (workstation development)))
(sleeper-service
(type . remote)
(hostname . "sleeper-service.tail807ea.ts.net")
(ssh-alias . "admin-sleeper")
(services . (nfs zfs storage)))
;; Additional machines...
))))
utils/logging.scm - Structured Logging
Comprehensive logging system with color coding and log levels:
;; ANSI color codes for terminal output
(define color-codes
'((reset . "\x1b[0m")
(bold . "\x1b[1m")
(red . "\x1b[31m")
(green . "\x1b[32m")
(yellow . "\x1b[33m")
(blue . "\x1b[34m")
(magenta . "\x1b[35m")
(cyan . "\x1b[36m")))
;; Log level hierarchy
(define log-levels
'((debug . 0)
(info . 1)
(warn . 2)
(error . 3)))
Installation & Setup
Prerequisites
- GNU Guile: Version 3.0 or later
- NixOS System: Target machines running NixOS
- SSH Access: Configured SSH keys for remote machines
- Required Guile Libraries:
guile-ssh
- SSH operationsguile-json
- JSON data handlingsrfi-1
,srfi-19
- Standard Scheme libraries
NixOS Integration
Add to your NixOS configuration:
environment.systemPackages = with pkgs; [
guile
guile-ssh
guile-json-4
];
Environment Setup
# Set environment variables
export HOMELAB_ROOT="/path/to/your/home-lab"
export GUILE_LOAD_PATH="/path/to/packages:$GUILE_LOAD_PATH"
# Test installation
guile -c "(use-modules (lab core)) (display \"Lab package loaded successfully\\n\")"
Usage Examples
Basic Infrastructure Status
#!/usr/bin/env guile
!#
(use-modules (lab core))
;; Check all machines
(define all-status (get-infrastructure-status))
(display all-status)
;; Check specific machine
(define machine-status (get-infrastructure-status "sleeper-service"))
(display machine-status)
Health Checking
(use-modules (lab core))
;; Comprehensive health check
(define health-report (check-system-health "grey-area"))
(for-each (lambda (check)
(let ((name (car check))
(result (cdr check)))
(format #t "~a: ~a~%" name
(assoc-ref result 'status))))
health-report)
Deployment Operations
(use-modules (lab core))
;; Validate environment before deployment
(if (validate-environment)
(begin
;; Update flake inputs
(update-flake '((dry-run . #f)))
;; Deploy to machine
(execute-nixos-rebuild "sleeper-service" "switch"
'((dry-run . #f))))
(display "Environment validation failed\n"))
Interactive REPL Usage
;; Start Guile REPL
$ guile -L packages
;; Load modules interactively
scheme@(guile-user)> (use-modules (lab core) (utils logging))
;; Set debug logging
scheme@(guile-user)> (set-log-level! 'debug)
;; Check machine status interactively
scheme@(guile-user)> (get-infrastructure-status "congenital-optimist")
;; Test health checks
scheme@(guile-user)> (check-system-health "grey-area")
API Reference
Infrastructure Management
(get-infrastructure-status [machine-name])
Returns structured status information for all machines or specific machine.
Parameters:
machine-name
(optional): Specific machine to check
Returns: Association list with machine status including:
machine
: Machine nametype
: 'local or 'remoteconnection
: 'online or 'offlineservices
: List of service statusessystem
: System information (uptime, load, memory, disk)check-time
: Time taken for status check
(check-system-health machine-name)
Performs comprehensive health validation.
Parameters:
machine-name
: Target machine name
Returns: List of health check results with status ('pass, 'fail, 'error) and details.
(execute-nixos-rebuild machine-name mode options)
Executes NixOS rebuild with error handling.
Parameters:
machine-name
: Target machinemode
: Rebuild mode ("switch", "boot", "test")options
: Configuration options (dry-run, etc.)
Returns: Boolean indicating success/failure.
Utility Functions
(validate-environment)
Validates home lab environment configuration.
(update-flake options)
Updates Nix flake inputs with optional dry-run.
Configuration Access
(get-machine-config machine-name)
Retrieves machine-specific configuration.
(get-all-machines)
Returns list of all configured machines.
(get-ssh-config machine-name)
Gets SSH configuration for machine.
Development
Code Style Guidelines
-
Naming Conventions:
- Use kebab-case for functions and variables
- Predicates end with
?
(e.g.,machine-online?
) - Mutating procedures end with
!
(e.g.,set-log-level!
)
-
Module Organization:
- One module per file
- Clear export lists
- Minimal external dependencies
-
Error Handling:
- Use
catch
for exception handling - Provide informative error messages
- Log errors with appropriate levels
- Use
-
Documentation:
- Docstrings for all public functions
- Inline comments for complex logic
- Example usage in comments
Testing
;; Example test structure
(use-modules (srfi srfi-64) ; Testing framework
(lab core))
(test-begin "infrastructure-tests")
(test-assert "validate-environment"
(validate-environment))
(test-assert "machine-connectivity"
(test-ssh-connection "congenital-optimist"))
(test-end "infrastructure-tests")
REPL-Driven Development
Guile's strength lies in interactive development:
;; Reload modules during development
scheme@(guile-user)> (reload-module (resolve-module '(lab core)))
;; Test functions interactively
scheme@(guile-user)> (get-machine-config "sleeper-service")
;; Debug with modified log levels
scheme@(guile-user)> (set-log-level! 'debug)
scheme@(guile-user)> (test-ssh-connection "grey-area")
Integration
Model Context Protocol (MCP) Server
The lab package includes MCP server capabilities for IDE integration, providing:
- Tools: Infrastructure management operations
- Resources: Machine configurations and status
- Prompts: Common operational workflows
VS Code Integration
Through the MCP server, the lab package integrates with VS Code to provide:
- Real-time infrastructure status
- Contextual deployment suggestions
- Configuration validation
- Automated documentation generation
Command Line Interface
The package provides a CLI wrapper for common operations:
# Using the Guile-based CLI
./home-lab-tool.scm status
./home-lab-tool.scm deploy sleeper-service
./home-lab-tool.scm health-check grey-area
Future Enhancements
- Web Dashboard: Real-time monitoring interface
- Metrics Collection: Prometheus integration
- Automated Recovery: Self-healing capabilities
- Configuration Validation: Pre-deployment checks
- Rollback Automation: Automatic failure recovery
Contributing
- Follow the established code style
- Add tests for new functionality
- Update documentation
- Test with multiple machine configurations
License
This project follows the same license as the Home Lab repository.
Note: This tool represents a migration from traditional Bash scripting to a more robust, functional programming approach using GNU Guile Scheme. The benefits include better error handling, structured data management, and more maintainable code for complex infrastructure management tasks.