Geir Okkenhaug Jerstad 8029d93a84 added niri

2025-06-10 20:33:54 +02:00

8.1 KiB

Raw Blame History

deploy-rs Research Summary

Overview

deploy-rs is a Rust-based deployment tool specifically designed for NixOS flakes. It provides declarative, reliable, and efficient deployment of NixOS configurations to remote machines with advanced features like rollback capabilities, health checks, and parallel deployments.

Repository: https://github.com/serokell/deploy-rs Status: Actively maintained by Serokell Language: Rust (fast, reliable, memory-safe)

Key Features

🚀 Core Capabilities

Flake-native: Built specifically for Nix flakes (no legacy nix-env/channels)
Multi-target deployment: Deploy to multiple machines simultaneously
Automatic rollback: Failed deployments automatically revert to previous generation
Health checks: Configurable post-deployment validation
SSH-based: Uses SSH for secure remote deployment (like our current lab tool)
Profile management: Supports system, user, and custom profiles

🔧 Advanced Features

Parallel deployment: Deploy to multiple machines concurrently
Interactive confirmation: Can prompt before applying changes
Dry-run mode: Preview changes without applying them
Magic rollback: Automatic rollback on deployment failures or health check failures
Custom activation: Define custom activation scripts and checks
Sudo handling: Intelligent sudo privilege escalation

📊 Reliability Features

Atomic deployments: Either succeeds completely or rolls back
Connection resilience: Handles SSH connection issues gracefully
Generation tracking: Keeps track of deployment history
Activation timeout: Prevents hanging deployments
Health check timeout: Configurable validation windows

Configuration Structure

Deploy-rs uses a declarative configuration format in your flake:

# flake.nix
{
  # ... existing flake configuration

  deploy.nodes = {
    sleeper-service = {
      hostname = "sleeper-service.tail807ea.ts.net";
      profiles.system = {
        user = "root";
        path = deploy-rs.lib.x86_64-linux.activate.nixos 
          self.nixosConfigurations.sleeper-service;
        sshUser = "sma";
        sudo = "sudo -u";
      };
    };
    
    grey-area = {
      hostname = "grey-area.tail807ea.ts.net";
      profiles.system = {
        user = "root";
        path = deploy-rs.lib.x86_64-linux.activate.nixos 
          self.nixosConfigurations.grey-area;
        sshUser = "sma";
        sudo = "sudo -u";
      };
    };
  };

  # Health checks
  deploy.nodes.sleeper-service.profiles.system.activationTimeout = 240;
  deploy.nodes.sleeper-service.profiles.system.confirmTimeout = 30;
}

  # This is highly advised by deploy-rs
    checks = builtins.mapAttrs (
      system: deployLib: deployLib.deployChecks inputs.self.deploy
    ) inputs.deploy-rs.lib;

Command Examples

# Deploy to single machine
deploy '.#sleeper-service'

# Deploy to all machines
deploy '.#'

# Dry run (check what would be deployed)
deploy '.#sleeper-service' -- --dry-activate

# Skip health checks
deploy '.#sleeper-service' -- --skip-checks

# Interactive confirmation
deploy '.#sleeper-service' -- --confirm-timeout 60

# Deploy specific profile
deploy '.#sleeper-service.system'

Comparison with Current `lab` Tool

Feature	Current `lab` Tool	deploy-rs
Language	Shell script	Rust (compiled)
Performance	Good	Excellent
Parallel deployment	❌	✅
Automatic rollback	❌	✅
Health checks	❌	✅
Flake-native	✅	✅
SSH-based	✅	✅
Status monitoring	✅	Limited
Custom workflows	✅	Limited
Learning curve	Low	Medium
Configuration	Shell script	Nix flake

Advantages of deploy-rs

✅ Production-Ready Features

Reliability: Automatic rollback prevents broken deployments
Speed: Rust performance + parallel deployment
Safety: Health checks ensure successful activation
Consistency: Declarative configuration in flake

✅ Operational Benefits

Reduced downtime: Atomic deployments with quick rollback
Error handling: Sophisticated error recovery mechanisms
Audit trail: Built-in deployment history tracking
Validation: Pre and post-deployment checks

✅ Scale Benefits

Multi-machine: Deploy entire infrastructure simultaneously
Efficiency: Parallel operations reduce total deployment time
Resource management: Better handling of resource conflicts

Disadvantages & Limitations

❌ Current Limitations

Status monitoring: No equivalent to lab status for infrastructure overview
Custom workflows: Less flexible than shell scripts for custom operations
Learning curve: Requires understanding deploy-rs configuration syntax
Debugging: Rust binary vs readable shell script
Community size: Smaller ecosystem compared to traditional tools

❌ Home Lab Specific Concerns

Overkill factor: May be complex for 3-4 machine home lab
Customization: Our lab tool has home lab specific features
Integration: Would need to replicate status monitoring capabilities
Development workflow: Less hackable than shell scripts

Implementation Recommendations

🎯 Hybrid Approach (Recommended)

Keep the best of both tools:

Use deploy-rs for deployments:

lab deploy-rs sleeper-service  # Use deploy-rs backend
lab deploy grey-area          # Current shell script method

Keep lab tool for status and management:

lab status                    # Infrastructure overview
lab check sleeper-service     # Health monitoring
lab logs grey-area            # Log access

🔧 Migration Strategy

Phase 1: Evaluation (Current)

Add deploy-rs configuration to flake
Test deployment on non-critical machine
Compare reliability and performance

Phase 2: Gradual Adoption

Migrate stable machines to deploy-rs
Keep custom lab commands for monitoring
Maintain shell script fallback

Phase 3: Integration

Enhance lab tool to use deploy-rs as backend
Add deploy-rs specific features to lab status
Maintain unified interface

📝 Flake Integration Example

# Add to flake.nix inputs
inputs.deploy-rs.url = "github:serokell/deploy-rs";

# Add to flake outputs
deploy = {
  nodes = {
    sleeper-service = {
      hostname = "10.0.0.8";  # Or Tailscale hostname
      profiles.system = {
        user = "root";
        path = deploy-rs.lib.x86_64-linux.activate.nixos 
          self.nixosConfigurations.sleeper-service;
        sshUser = "sma";
        sshOpts = ["-p" "22"];
        fastConnection = false;  # For home network
        autoRollback = true;
        magicRollback = true;
        activationTimeout = 180;
        confirmTimeout = 30;
      };
    };
  };
};

# Health checks can reference systemd services
deploy.nodes.sleeper-service.profiles.system.activationTimeout = 240;

Conclusion & Recommendation

🎯 For Our Home Lab

deploy-rs is valuable for:

Production-quality deployments with rollback safety
Parallel deployment when infrastructure grows
Reduced risk of broken remote systems
Professional deployment practices

Our current lab tool excels at:

Home lab specific status monitoring
Custom workflows and debugging
Simple, hackable shell script approach
Tailored for our specific infrastructure

📋 Action Plan

Immediate: Add deploy-rs configuration to flake (low effort, high learning)
Short-term: Test deploy-rs on sleeper-service alongside current method
Medium-term: Consider hybrid approach - deploy-rs for deployment, lab for monitoring
Long-term: Evaluate full migration based on home lab growth and complexity needs

Verdict: deploy-rs is a professional-grade tool that would enhance our deployment reliability. The hybrid approach allows us to benefit from deploy-rs's deployment safety while keeping our custom infrastructure monitoring capabilities.

8.1 KiB Raw Blame History