# Ollama Deployment Guide

## Overview

This guide covers the deployment and management of Ollama on the grey-area server in your home lab. Ollama provides local Large Language Model (LLM) hosting with an OpenAI-compatible API.

## Quick Start

### 1. Deploy the Service

The Ollama service is already configured in your NixOS configuration. To deploy:

```bash
# Navigate to your home lab directory
cd /home/geir/Home-lab

# Build and switch to the new configuration
sudo nixos-rebuild switch --flake .#grey-area
```

### 2. Verify Installation

After deployment, verify the service is running:

```bash
# Check service status
systemctl status ollama

# Check if API is responding
curl http://localhost:11434/api/tags

# Run the test script
sudo /etc/ollama-test.sh
```

### 3. Monitor Model Downloads

The service will automatically download the configured models on first start:

```bash
# Monitor the model download process
journalctl -u ollama-model-download -f

# Check downloaded models
ollama list
```

## Configuration Details

### Current Configuration

- **Host**: `127.0.0.1` (localhost only for security)
- **Port**: `11434` (standard Ollama port)
- **Models**: llama3.3:8b, codellama:7b, mistral:7b
- **Memory Limit**: 12GB
- **CPU Limit**: 75%
- **Data Directory**: `/var/lib/ollama`

### Included Models

1. **llama3.3:8b** (~4.7GB)
   - General purpose model
   - Excellent reasoning capabilities
   - Good for general questions and tasks

2. **codellama:7b** (~3.8GB)
   - Code-focused model
   - Great for code review, generation, and explanation
   - Supports multiple programming languages

3. **mistral:7b** (~4.1GB)
   - Fast inference
   - Good balance of speed and quality
   - Efficient for quick queries

## Usage Examples

### Basic API Usage

```bash
# Generate text
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3:8b",
    "prompt": "Explain the benefits of NixOS",
    "stream": false
  }'

# Chat completion (OpenAI compatible)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3:8b",
    "messages": [
      {"role": "user", "content": "Help me debug this NixOS configuration"}
    ]
  }'
```

### Interactive Usage

```bash
# Start interactive chat with a model
ollama run llama3.3:8b

# Code assistance
ollama run codellama:7b "Review this function for security issues: $(cat myfile.py)"

# Quick questions
ollama run mistral:7b "What's the difference between systemd services and timers?"
```

### Development Integration

```bash
# Code review in git hooks
echo "#!/bin/bash
git diff HEAD~1 | ollama run codellama:7b 'Review this code diff for issues:'" > .git/hooks/post-commit

# Documentation generation
ollama run llama3.3:8b "Generate documentation for this NixOS module: $(cat module.nix)"
```

## Management Commands

### Service Management

```bash
# Start/stop/restart service
sudo systemctl start ollama
sudo systemctl stop ollama
sudo systemctl restart ollama

# View logs
journalctl -u ollama -f

# Check health
systemctl status ollama-health-check
```

### Model Management

```bash
# List installed models
ollama list

# Download additional models
ollama pull qwen2.5:7b

# Remove models
ollama rm model-name

# Show model information
ollama show llama3.3:8b
```

### Monitoring

```bash
# Check resource usage
systemctl show ollama --property=MemoryCurrent,CPUUsageNSec

# View health check logs
journalctl -u ollama-health-check

# Monitor API requests
tail -f /var/log/ollama.log
```

## Troubleshooting

### Common Issues

#### Service Won't Start
```bash
# Check for configuration errors
journalctl -u ollama --no-pager

# Verify disk space (models are large)
df -h /var/lib/ollama

# Check memory availability
free -h
```

#### Models Not Downloading
```bash
# Check model download service
systemctl status ollama-model-download
journalctl -u ollama-model-download

# Manually download models
sudo -u ollama ollama pull llama3.3:8b
```

#### API Not Responding
```bash
# Check if service is listening
ss -tlnp | grep 11434

# Test API manually
curl -v http://localhost:11434/api/tags

# Check firewall (if accessing externally)
sudo iptables -L | grep 11434
```

#### Out of Memory Errors
```bash
# Check current memory usage
cat /sys/fs/cgroup/system.slice/ollama.service/memory.current

# Reduce resource limits in configuration
# Edit grey-area/services/ollama.nix and reduce maxMemory
```

### Performance Optimization

#### For Better Performance
1. **Add more RAM**: Models perform better with more available memory
2. **Use SSD storage**: Faster model loading from NVMe/SSD
3. **Enable GPU acceleration**: If you have compatible GPU hardware
4. **Adjust context length**: Reduce OLLAMA_CONTEXT_LENGTH for faster responses

#### For Lower Resource Usage
1. **Use smaller models**: Consider 2B or 3B parameter models
2. **Reduce parallel requests**: Set OLLAMA_NUM_PARALLEL to 1
3. **Limit memory**: Reduce maxMemory setting
4. **Use quantized models**: Many models have Q4_0, Q5_0 variants

## Security Considerations

### Current Security Posture
- Service runs as dedicated `ollama` user
- Bound to localhost only (no external access)
- Systemd security hardening enabled
- No authentication (intended for local use)

### Enabling External Access

If you need external access, use a reverse proxy instead of opening the port directly:

```nix
# Add to grey-area configuration
services.nginx = {
  enable = true;
  virtualHosts."ollama.grey-area.lan" = {
    listen = [{ addr = "0.0.0.0"; port = 8080; }];
    locations."/" = {
      proxyPass = "http://127.0.0.1:11434";
      extraConfig = ''
        # Add authentication here if needed
        # auth_basic "Ollama API";
        # auth_basic_user_file /etc/nginx/ollama.htpasswd;
      '';
    };
  };
};
```

## Integration Examples

### With Forgejo
Create a webhook or git hook to review code:

```bash
#!/bin/bash
# .git/hooks/pre-commit
git diff --cached | ollama run codellama:7b "Review this code for issues:"
```

### With Development Workflow
```bash
# Add to shell aliases
alias code-review='git diff | ollama run codellama:7b "Review this code:"'
alias explain-code='ollama run codellama:7b "Explain this code:"'
alias write-docs='ollama run llama3.3:8b "Write documentation for:"'
```

### With Other Services
```bash
# Generate descriptions for Jellyfin media
find /media -name "*.mkv" | while read file; do
  echo "Generating description for $(basename "$file")"
  echo "$(basename "$file" .mkv)" | ollama run llama3.3:8b "Create a brief description for this movie/show:"
done
```

## Backup and Maintenance

### Automatic Backups
- Configuration backup: Included in NixOS configuration
- Model manifests: Backed up weekly to `/var/backup/ollama`
- Model files: Not backed up (re-downloadable)

### Manual Backup
```bash
# Backup custom models or fine-tuned models
sudo tar -czf ollama-custom-$(date +%Y%m%d).tar.gz /var/lib/ollama/

# Backup to remote location
sudo rsync -av /var/lib/ollama/ backup-server:/backups/ollama/
```

### Updates
```bash
# Update Ollama package
sudo nixos-rebuild switch --flake .#grey-area

# Update models (if new versions available)
ollama pull llama3.3:8b
ollama pull codellama:7b
ollama pull mistral:7b
```

## Future Enhancements

### Potential Additions
1. **Web UI**: Deploy Open WebUI for browser-based interaction
2. **Model Management**: Automated model updates and cleanup
3. **Multi-GPU**: Support for multiple GPU acceleration
4. **Custom Models**: Fine-tuning setup for domain-specific models
5. **Metrics**: Prometheus metrics export for monitoring
6. **Load Balancing**: Multiple Ollama instances for high availability

### Scaling Considerations
- **Dedicated Hardware**: Move to dedicated AI server if resource constrained
- **Model Optimization**: Implement model quantization and optimization
- **Caching**: Add Redis caching for frequently requested responses
- **Rate Limiting**: Implement rate limiting for external access

## Support and Resources

### Documentation
- [Ollama Documentation](https://github.com/ollama/ollama)
- [Model Library](https://ollama.ai/library)
- [API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)

### Community
- [Ollama Discord](https://discord.gg/ollama)
- [GitHub Discussions](https://github.com/ollama/ollama/discussions)

### Local Resources
- Research document: `/home/geir/Home-lab/research/ollama.md`
- Configuration: `/home/geir/Home-lab/machines/grey-area/services/ollama.nix`
- Module: `/home/geir/Home-lab/modules/services/ollama.nix`