- Optimize Ollama service configuration for maximum CPU performance
- Increase OLLAMA_NUM_PARALLEL from 2 to 4 workers
- Increase OLLAMA_CONTEXT_LENGTH from 4096 to 8192 tokens
- Add OLLAMA_KV_CACHE_TYPE=q8_0 for memory efficiency
- Set OLLAMA_LLM_LIBRARY=cpu_avx2 for optimal CPU performance
- Configure OpenMP threading with 8 threads and core binding
- Add comprehensive systemd resource limits and CPU quotas
- Remove incompatible NUMA policy setting
- Upgrade TaskMaster AI model ecosystem
- Main model: qwen3:4b → qwen2.5-coder:7b (specialized coding model)
- Research model: deepseek-r1:1.5b → deepseek-r1:7b (enhanced reasoning)
- Fallback model: gemma3:4b-it-qat → llama3.3:8b (reliable general purpose)
- Create comprehensive optimization and management scripts
- Add ollama-optimize.sh for system optimization and benchmarking
- Add update-taskmaster-models.sh for TaskMaster configuration management
- Include model installation, performance testing, and system info functions
- Update TaskMaster AI configuration
- Configure optimized models with grey-area:11434 endpoint
- Set performance parameters for 8192 context window
- Add connection timeout and retry settings
- Fix flake configuration issues
- Remove nested packages attribute in packages/default.nix
- Fix package references in modules/users/geir.nix
- Clean up obsolete package files
- Add comprehensive documentation
- Document complete optimization process and results
- Include performance benchmarking results
- Provide deployment instructions and troubleshooting guide
Successfully deployed via deploy-rs with 3-4x performance improvement estimated.
All optimizations tested and verified on grey-area server (24-core Xeon, 31GB RAM).