Ollama Setup Guide
This guide walks you through installing Ollama and setting up local LLM features in CommandLane.
Why Ollama?
Ollama enables you to run AI models locally on your machine, providing:
- Privacy: Your data never leaves your computer
- No API costs: Free local inference (no OpenAI/Anthropic subscription needed)
- Offline capability: Works without internet connection
- Fast inference: Optimized for local hardware (CPU/GPU)
Installation
Windows
-
Download the Ollama installer:
# Visit https://ollama.com/download and download the Windows installer
# Or use winget:
winget install Ollama.Ollama -
Run the installer and follow the prompts
-
Verify installation:
ollama --version
macOS
# Using Homebrew (recommended)
brew install ollama
# Or download from https://ollama.com/download
Linux
# Install via curl
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
Download the Recommended Model
CommandLane recommends the Phi-3 Mini 4K Instruct model (Q4_K_M quantization) for optimal performance:
ollama pull phi3:3.8b-mini-4k-instruct-q4_K_M
Why this model?
- Size: ~2.4 GB (fits on most systems)
- Performance: Excellent balance of speed and quality
- Optimized: Quantized for efficient CPU inference
- Context: 4096 token context window
- Accuracy: Performs well on classification and planning tasks
Alternative Models
If you prefer different tradeoffs:
Faster (smaller, less capable):
ollama pull phi3:mini # 2.2 GB, faster inference
More capable (larger, slower):
ollama pull phi3:3.8b-mini-4k-instruct-q6_K # 3.1 GB, higher quality
Verify Ollama is Running
# Check Ollama status
curl http://localhost:11434
# List installed models
ollama list
Expected output:
Ollama is running
Configure CommandLane
Option 1: Automatic Detection (Default)
CommandLane automatically detects Ollama if it's running on http://localhost:11434. No configuration needed!
Option 2: Custom Base URL
If you're running Ollama on a different port or remote server:
- Open the CommandLane dashboard
- Navigate to Settings → AI Settings → Advanced
- Set Ollama Base URL (e.g.,
http://192.168.1.100:11434) - Click Save
Option 3: Configuration File
Edit pkb.config.json:
{
"integrations": {
"ollama": {
"config": {
"base_url": "http://localhost:11434"
},
"connected": true
}
}
}
Select Ollama as Provider
For Classification
- Go to Settings → AI Settings
- Under Classification, select Provider: Ollama (Local)
- Model will be auto-selected (uses your installed phi3 model)
For Planning
- Go to Settings → AI Settings
- Under Planning, select Provider: Ollama (Local)
- Model will be auto-selected
For Chat
- Go to Settings → AI Settings
- Under Ask Feature, select Provider: Ollama (Local)
- Choose your model from the dropdown (shows all installed models)
Troubleshooting
"Ollama integration not available"
Cause: Ollama server is not running or not reachable.
Solutions:
-
Verify Ollama is running:
curl http://localhost:11434 -
Start Ollama service:
# Windows: Ollama runs as a service (check system tray)
# macOS/Linux:
ollama serve -
Check firewall settings (ensure port 11434 is not blocked)
"No phi3 models found"
Cause: Recommended model not installed.
Solution:
ollama pull phi3:3.8b-mini-4k-instruct-q4_K_M
Model inference is slow
Solutions:
-
Use GPU: Ollama automatically detects NVIDIA GPUs
- Verify GPU usage:
ollama ps(check PROCESSOR column) - Update GPU drivers if not detected
- Verify GPU usage:
-
Use smaller model:
ollama pull phi3:mini -
Reduce context window: In advanced settings, lower
num_ctx
"Connection failed" with custom base URL
Checklist:
- Ensure Ollama is running on the target machine
- Verify network connectivity:
curl http://<ip>:11434 - Check firewall rules allow incoming connections
- Update CommandLane settings with correct URL (no trailing slash)
Performance Tips
Optimal Settings for Phi3
For fast classification:
- Model:
phi3:miniorphi3:3.8b-mini-4k-instruct-q4_K_M - Temperature: 0.05 (deterministic)
- Tokens: 150 (short responses)
For planning:
- Model:
phi3:3.8b-mini-4k-instruct-q4_K_M - Temperature: 0.1
- Tokens: 300 (structured JSON output)
For chat:
- Model: Any phi3 variant or larger models
- Temperature: 0.7 (balanced creativity)
- Tokens: 2000 (longer conversations)
GPU Acceleration
If you have an NVIDIA GPU:
-
Verify GPU is detected:
ollama ps # Check PROCESSOR column shows "gpu" -
If CPU-only, install/update CUDA drivers:
- Windows/Linux: NVIDIA CUDA Toolkit
- GPU memory should be ≥4GB for phi3 models
Managing Disk Space
Models can consume significant disk space:
# List installed models with sizes
ollama list
# Remove unused models
ollama rm <model-name>
# Example: Remove old model
ollama rm gemma2:2b
Remote Ollama Setup (Advanced)
Server Setup
On your server machine:
# Allow external connections
export OLLAMA_HOST=0.0.0.0:11434
# Start Ollama
ollama serve
Only expose Ollama on trusted networks. It has no authentication.
Client Configuration
On CommandLane machine:
- Settings → AI Settings → Advanced
- Set Ollama Base URL:
http://<server-ip>:11434 - Save and verify connection
Next Steps
- Explore the Dashboard → Integrations page for connection status
- Test classification by capturing some text
- Try the planning feature with
Ctrl+Shift+P(customize tasks) - Use the Ask feature for chat interactions