home-lab/packages/lab-tool
Geir O. Jerstad 6eac143f57 feat: Add little-rascal laptop config and lab-tool auto-update system
## New Machine: little-rascal
- Add Lenovo Yoga Slim 7 14ARE05 configuration (AMD Ryzen 7 4700U)
- Niri desktop with CLI login (greetd + tuigreet)
- zram swap configuration (25% of RAM with zstd)
- AMD-optimized hardware support and power management
- Based on congenital-optimist structure with laptop-specific additions

## Lab Tool Auto-Update System
- Implement Guile Scheme auto-update module (lab/auto-update.scm)
- Add health checks, logging, and safety features
- Integrate with existing deployment and machine management
- Update main CLI with auto-update and auto-update-status commands
- Create NixOS service module for automated updates
- Document complete implementation in simple-auto-update-plan.md

## MCP Integration
- Configure Task Master AI and Context7 MCP servers
- Set up local Ollama integration for AI processing
- Add proper environment configuration for existing models

## Infrastructure Updates
- Add little-rascal to flake.nix with deploy-rs support
- Fix common user configuration issues
- Create missing emacs.nix module
- Update package integrations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-27 22:03:54 +02:00
..
archive cleaned up and maybe finished the guile lab tool 2025-06-16 21:09:41 +02:00
config grokking simplicity and refactoring 2025-06-16 13:43:21 +02:00
lab feat: Add little-rascal laptop config and lab-tool auto-update system 2025-06-27 22:03:54 +02:00
mcp feat: implement modular lab tool structure with working CLI 2025-06-16 14:29:00 +02:00
research grokking simplicity and refactoring 2025-06-16 13:43:21 +02:00
testing cleaned up and maybe finished the guile lab tool 2025-06-16 21:09:41 +02:00
utils feat: Complete Ollama CPU optimization for TaskMaster AI 2025-06-18 13:08:24 +02:00
default.nix feat: Complete Ollama CPU optimization for TaskMaster AI 2025-06-18 13:08:24 +02:00
flake.lock grokking simplicity and refactoring 2025-06-16 13:43:21 +02:00
main.scm feat: Add little-rascal laptop config and lab-tool auto-update system 2025-06-27 22:03:54 +02:00
PROJECT_STATUS.md cleaned up and maybe finished the guile lab tool 2025-06-16 21:09:41 +02:00
README.md grokking simplicity and refactoring 2025-06-16 13:43:21 +02:00
REFACTORING_SUMMARY.md feat: implement modular lab tool structure with working CLI 2025-06-16 14:29:00 +02:00
TODO.md cleaned up and maybe finished the guile lab tool 2025-06-16 21:09:41 +02:00

Lab Package - Home Lab Infrastructure Management in Guile Scheme

A comprehensive home lab management tool implemented in GNU Guile Scheme, providing infrastructure monitoring, deployment automation, and health checking capabilities for NixOS-based systems.

Table of Contents

Overview

The Lab package is a sophisticated infrastructure management tool designed for home lab environments running NixOS. It provides:

  • Infrastructure Status Monitoring: Real-time status checking across multiple machines
  • Deployment Automation: Safe NixOS configuration deployment with multiple strategies
  • Health Checking: Comprehensive system health validation
  • SSH-based Operations: Secure remote command execution and file operations
  • Error Handling: Robust error reporting and recovery mechanisms
  • MCP Integration: Model Context Protocol server for IDE integration

Background on GNU Guile

What is GNU Guile?

GNU Guile is the official extension language for the GNU Project and is an implementation of the Scheme programming language. Scheme is a minimalist dialect of Lisp, designed for clarity and elegance.

Why Guile for Infrastructure Management?

Advantages over traditional shell scripting:

  1. Rich Data Structures: Native support for complex data manipulation with lists, association lists, and records
  2. Error Handling: Sophisticated condition system for graceful error recovery
  3. Modularity: Built-in module system for organizing large codebases
  4. Functional Programming: Immutable data structures and pure functions reduce bugs
  5. REPL-Driven Development: Interactive development and debugging capabilities
  6. Extensibility: Easy integration with C libraries and external tools

Scheme Language Features:

  • S-expressions: Code as data, enabling powerful metaprogramming
  • First-class functions: Functions can be passed as arguments and returned as values
  • Pattern matching: Elegant control flow with the match macro
  • Tail call optimization: Efficient recursive algorithms
  • Hygienic macros: Safe code generation and transformation

Example of Scheme expressiveness:

;; Traditional imperative approach (pseudo-code)
machines = get_machines()
results = []
for machine in machines:
    status = check_machine(machine)
    results.append((machine, status))

;; Functional Scheme approach
(map (lambda (machine)
       `(,machine . ,(check-machine machine)))
     (get-machines))

Code Architecture

The lab package follows a modular architecture with clear separation of concerns:

packages/lab/
├── core.scm              # Core infrastructure operations
├── deployment.scm        # Deployment strategies and execution
├── machines.scm          # Machine management and discovery
├── monitoring.scm        # Health checking and status monitoring
└── utils/
    ├── config.scm        # Configuration management
    ├── logging.scm       # Structured logging system
    └── ssh.scm           # SSH operations and connectivity

Design Principles

  1. Functional Core, Imperative Shell: Pure functions for business logic, side effects at boundaries
  2. Data-Driven Design: Configuration and machine definitions as data structures
  3. Composable Operations: Small, focused functions that can be combined
  4. Error Boundaries: Comprehensive error handling with informative messages
  5. Testable Components: Functions designed for easy unit testing

Core Modules

core.scm - Main Infrastructure Operations

The core module provides the primary interface for infrastructure management:

(define-module (lab core)
  #:export (get-infrastructure-status
            check-system-health
            update-flake
            validate-environment
            execute-nixos-rebuild))

Key Functions

get-infrastructure-status

  • Retrieves comprehensive status of all machines or a specific machine
  • Returns structured data including connectivity, services, and system metrics
  • Supports both local and remote machine checking

check-system-health

  • Performs comprehensive health checks including:
    • SSH connectivity testing
    • Disk space validation (< 90% usage)
    • System load monitoring (< 5.0 load average)
    • Critical service status (sshd, etc.)
    • Network connectivity verification

execute-nixos-rebuild

  • Safe NixOS deployment with comprehensive error handling
  • Supports multiple deployment modes (switch, boot, test)
  • Handles both local and remote deployments
  • Includes dry-run capabilities for testing

Code Analysis Example

;; Infrastructure status checking with error handling
(define (get-infrastructure-status . args)
  "Get status of all machines or specific machine if provided"
  (let ((target-machine (if (null? args) #f (car args)))
        (machines (if (null? args) 
                     (get-all-machines) 
                     (list (car args)))))
    
    (log-info "Checking infrastructure status...")
    
    (map (lambda (machine-name)
           (let ((start-time (current-time)))
             (log-debug "Checking ~a..." machine-name)
             
             ;; Gather machine information with error isolation
             (let* ((ssh-config (get-ssh-config machine-name))
                    (is-local (and ssh-config (assoc-ref ssh-config 'is-local)))
                    (connection-status (test-ssh-connection machine-name))
                    (services-status (if connection-status
                                       (get-machine-services-status machine-name)
                                       '()))
                    (system-info (if connection-status
                                   (get-machine-system-info machine-name)
                                   #f))
                    (elapsed (- (current-time) start-time)))
               
               ;; Return structured status data
               `((machine . ,machine-name)
                 (type . ,(if is-local 'local 'remote))
                 (connection . ,(if connection-status 'online 'offline))
                 (services . ,services-status)
                 (system . ,system-info)
                 (check-time . ,elapsed)))))
         machines)))

This code demonstrates several Scheme/Guile idioms:

  1. Optional Arguments: Using (args) with null? check for flexible function signatures
  2. Let Bindings: Structured variable binding with let* for dependent calculations
  3. Conditional Expressions: Using if expressions for control flow
  4. Association Lists: Structured data representation with assoc-ref
  5. Quasiquote/Unquote: Building structured data with `` ((key . ,value)) syntax

utils/ssh.scm - SSH Operations

Handles all SSH-related operations with comprehensive error handling:

(define (test-ssh-connection machine-name)
  (let ((ssh-config (get-ssh-config machine-name)))
    (if (not ssh-config)
        (begin
          (log-error "No SSH configuration found for ~a" machine-name)
          #f)
        (if (assoc-ref ssh-config 'is-local)
            (begin
              (log-debug "Machine ~a is local, skipping SSH test" machine-name)
              #t)
            ;; Remote SSH testing logic
            (let ((hostname (assoc-ref ssh-config 'hostname))
                  (ssh-alias (assoc-ref ssh-config 'ssh-alias)))
              (catch #t
                (lambda ()
                  (let* ((test-cmd (if ssh-alias
                                      (format #f "ssh -o ConnectTimeout=5 -o BatchMode=yes ~a echo OK" ssh-alias)
                                      (format #f "ssh -o ConnectTimeout=5 -o BatchMode=yes ~a echo OK" hostname)))
                         (port (open-pipe* OPEN_READ "/bin/sh" "-c" test-cmd))
                         (output (get-string-all port))
                         (status (close-pipe port)))
                    (zero? status)))
                (lambda (key . args)
                  (log-warn "SSH connection test failed for ~a: ~a" machine-name key)
                  #f)))))))

utils/config.scm - Configuration Management

Provides structured configuration handling:

(define default-config
  `((homelab-root . "/home/geir/Home-lab")
    (machines . ((congenital-optimist 
                  (type . local)
                  (hostname . "localhost")
                  (services . (workstation development)))
                 (sleeper-service
                  (type . remote)
                  (hostname . "sleeper-service.tail807ea.ts.net")
                  (ssh-alias . "admin-sleeper")
                  (services . (nfs zfs storage)))
                 ;; Additional machines...
                 ))))

utils/logging.scm - Structured Logging

Comprehensive logging system with color coding and log levels:

;; ANSI color codes for terminal output
(define color-codes
  '((reset . "\x1b[0m")
    (bold . "\x1b[1m")
    (red . "\x1b[31m")
    (green . "\x1b[32m")
    (yellow . "\x1b[33m")
    (blue . "\x1b[34m")
    (magenta . "\x1b[35m")
    (cyan . "\x1b[36m")))

;; Log level hierarchy
(define log-levels
  '((debug . 0)
    (info . 1)
    (warn . 2)
    (error . 3)))

Installation & Setup

Prerequisites

  1. GNU Guile: Version 3.0 or later
  2. NixOS System: Target machines running NixOS
  3. SSH Access: Configured SSH keys for remote machines
  4. Required Guile Libraries:
    • guile-ssh - SSH operations
    • guile-json - JSON data handling
    • srfi-1, srfi-19 - Standard Scheme libraries

NixOS Integration

Add to your NixOS configuration:

environment.systemPackages = with pkgs; [
  guile
  guile-ssh
  guile-json-4
];

Environment Setup

# Set environment variables
export HOMELAB_ROOT="/path/to/your/home-lab"
export GUILE_LOAD_PATH="/path/to/packages:$GUILE_LOAD_PATH"

# Test installation
guile -c "(use-modules (lab core)) (display \"Lab package loaded successfully\\n\")"

Usage Examples

Basic Infrastructure Status

#!/usr/bin/env guile
!#

(use-modules (lab core))

;; Check all machines
(define all-status (get-infrastructure-status))
(display all-status)

;; Check specific machine
(define machine-status (get-infrastructure-status "sleeper-service"))
(display machine-status)

Health Checking

(use-modules (lab core))

;; Comprehensive health check
(define health-report (check-system-health "grey-area"))
(for-each (lambda (check)
            (let ((name (car check))
                  (result (cdr check)))
              (format #t "~a: ~a~%" name 
                      (assoc-ref result 'status))))
          health-report)

Deployment Operations

(use-modules (lab core))

;; Validate environment before deployment
(if (validate-environment)
    (begin
      ;; Update flake inputs
      (update-flake '((dry-run . #f)))
      
      ;; Deploy to machine
      (execute-nixos-rebuild "sleeper-service" "switch" 
                           '((dry-run . #f))))
    (display "Environment validation failed\n"))

Interactive REPL Usage

;; Start Guile REPL
$ guile -L packages

;; Load modules interactively
scheme@(guile-user)> (use-modules (lab core) (utils logging))

;; Set debug logging
scheme@(guile-user)> (set-log-level! 'debug)

;; Check machine status interactively
scheme@(guile-user)> (get-infrastructure-status "congenital-optimist")

;; Test health checks
scheme@(guile-user)> (check-system-health "grey-area")

API Reference

Infrastructure Management

(get-infrastructure-status [machine-name])

Returns structured status information for all machines or specific machine.

Parameters:

  • machine-name (optional): Specific machine to check

Returns: Association list with machine status including:

  • machine: Machine name
  • type: 'local or 'remote
  • connection: 'online or 'offline
  • services: List of service statuses
  • system: System information (uptime, load, memory, disk)
  • check-time: Time taken for status check

(check-system-health machine-name)

Performs comprehensive health validation.

Parameters:

  • machine-name: Target machine name

Returns: List of health check results with status ('pass, 'fail, 'error) and details.

(execute-nixos-rebuild machine-name mode options)

Executes NixOS rebuild with error handling.

Parameters:

  • machine-name: Target machine
  • mode: Rebuild mode ("switch", "boot", "test")
  • options: Configuration options (dry-run, etc.)

Returns: Boolean indicating success/failure.

Utility Functions

(validate-environment)

Validates home lab environment configuration.

(update-flake options)

Updates Nix flake inputs with optional dry-run.

Configuration Access

(get-machine-config machine-name)

Retrieves machine-specific configuration.

(get-all-machines)

Returns list of all configured machines.

(get-ssh-config machine-name)

Gets SSH configuration for machine.

Development

Code Style Guidelines

  1. Naming Conventions:

    • Use kebab-case for functions and variables
    • Predicates end with ? (e.g., machine-online?)
    • Mutating procedures end with ! (e.g., set-log-level!)
  2. Module Organization:

    • One module per file
    • Clear export lists
    • Minimal external dependencies
  3. Error Handling:

    • Use catch for exception handling
    • Provide informative error messages
    • Log errors with appropriate levels
  4. Documentation:

    • Docstrings for all public functions
    • Inline comments for complex logic
    • Example usage in comments

Testing

;; Example test structure
(use-modules (srfi srfi-64)    ; Testing framework
             (lab core))

(test-begin "infrastructure-tests")

(test-assert "validate-environment"
  (validate-environment))

(test-assert "machine-connectivity"
  (test-ssh-connection "congenital-optimist"))

(test-end "infrastructure-tests")

REPL-Driven Development

Guile's strength lies in interactive development:

;; Reload modules during development
scheme@(guile-user)> (reload-module (resolve-module '(lab core)))

;; Test functions interactively
scheme@(guile-user)> (get-machine-config "sleeper-service")

;; Debug with modified log levels
scheme@(guile-user)> (set-log-level! 'debug)
scheme@(guile-user)> (test-ssh-connection "grey-area")

Integration

Model Context Protocol (MCP) Server

The lab package includes MCP server capabilities for IDE integration, providing:

  • Tools: Infrastructure management operations
  • Resources: Machine configurations and status
  • Prompts: Common operational workflows

VS Code Integration

Through the MCP server, the lab package integrates with VS Code to provide:

  • Real-time infrastructure status
  • Contextual deployment suggestions
  • Configuration validation
  • Automated documentation generation

Command Line Interface

The package provides a CLI wrapper for common operations:

# Using the Guile-based CLI
./home-lab-tool.scm status
./home-lab-tool.scm deploy sleeper-service
./home-lab-tool.scm health-check grey-area

Future Enhancements

  1. Web Dashboard: Real-time monitoring interface
  2. Metrics Collection: Prometheus integration
  3. Automated Recovery: Self-healing capabilities
  4. Configuration Validation: Pre-deployment checks
  5. Rollback Automation: Automatic failure recovery

Contributing

  1. Follow the established code style
  2. Add tests for new functionality
  3. Update documentation
  4. Test with multiple machine configurations

License

This project follows the same license as the Home Lab repository.


Note: This tool represents a migration from traditional Bash scripting to a more robust, functional programming approach using GNU Guile Scheme. The benefits include better error handling, structured data management, and more maintainable code for complex infrastructure management tasks.