Skip to main content

Ollama Setup Guide

This guide walks you through installing Ollama and setting up local LLM features in CommandLane.

Why Ollama?

Ollama enables you to run AI models locally on your machine, providing:

  • Privacy: Your data never leaves your computer
  • No API costs: Free local inference (no OpenAI/Anthropic subscription needed)
  • Offline capability: Works without internet connection
  • Fast inference: Optimized for local hardware (CPU/GPU)

Installation

Windows

  1. Download the Ollama installer:

    # Visit https://ollama.com/download and download the Windows installer
    # Or use winget:
    winget install Ollama.Ollama
  2. Run the installer and follow the prompts

  3. Verify installation:

    ollama --version

macOS

# Using Homebrew (recommended)
brew install ollama

# Or download from https://ollama.com/download

Linux

# Install via curl
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

CommandLane recommends the Phi-3 Mini 4K Instruct model (Q4_K_M quantization) for optimal performance:

ollama pull phi3:3.8b-mini-4k-instruct-q4_K_M

Why this model?

  • Size: ~2.4 GB (fits on most systems)
  • Performance: Excellent balance of speed and quality
  • Optimized: Quantized for efficient CPU inference
  • Context: 4096 token context window
  • Accuracy: Performs well on classification and planning tasks

Alternative Models

If you prefer different tradeoffs:

Faster (smaller, less capable):

ollama pull phi3:mini  # 2.2 GB, faster inference

More capable (larger, slower):

ollama pull phi3:3.8b-mini-4k-instruct-q6_K  # 3.1 GB, higher quality

Verify Ollama is Running

# Check Ollama status
curl http://localhost:11434

# List installed models
ollama list

Expected output:

Ollama is running

Configure CommandLane

Option 1: Automatic Detection (Default)

CommandLane automatically detects Ollama if it's running on http://localhost:11434. No configuration needed!

Option 2: Custom Base URL

If you're running Ollama on a different port or remote server:

  1. Open the CommandLane dashboard
  2. Navigate to Settings → AI Settings → Advanced
  3. Set Ollama Base URL (e.g., http://192.168.1.100:11434)
  4. Click Save

Option 3: Configuration File

Edit pkb.config.json:

{
"integrations": {
"ollama": {
"config": {
"base_url": "http://localhost:11434"
},
"connected": true
}
}
}

Select Ollama as Provider

For Classification

  1. Go to Settings → AI Settings
  2. Under Classification, select Provider: Ollama (Local)
  3. Model will be auto-selected (uses your installed phi3 model)

For Planning

  1. Go to Settings → AI Settings
  2. Under Planning, select Provider: Ollama (Local)
  3. Model will be auto-selected

For Chat

  1. Go to Settings → AI Settings
  2. Under Ask Feature, select Provider: Ollama (Local)
  3. Choose your model from the dropdown (shows all installed models)

Troubleshooting

"Ollama integration not available"

Cause: Ollama server is not running or not reachable.

Solutions:

  1. Verify Ollama is running:

    curl http://localhost:11434
  2. Start Ollama service:

    # Windows: Ollama runs as a service (check system tray)
    # macOS/Linux:
    ollama serve
  3. Check firewall settings (ensure port 11434 is not blocked)

"No phi3 models found"

Cause: Recommended model not installed.

Solution:

ollama pull phi3:3.8b-mini-4k-instruct-q4_K_M

Model inference is slow

Solutions:

  1. Use GPU: Ollama automatically detects NVIDIA GPUs

    • Verify GPU usage: ollama ps (check PROCESSOR column)
    • Update GPU drivers if not detected
  2. Use smaller model:

    ollama pull phi3:mini
  3. Reduce context window: In advanced settings, lower num_ctx

"Connection failed" with custom base URL

Checklist:

  • Ensure Ollama is running on the target machine
  • Verify network connectivity: curl http://<ip>:11434
  • Check firewall rules allow incoming connections
  • Update CommandLane settings with correct URL (no trailing slash)

Performance Tips

Optimal Settings for Phi3

For fast classification:

  • Model: phi3:mini or phi3:3.8b-mini-4k-instruct-q4_K_M
  • Temperature: 0.05 (deterministic)
  • Tokens: 150 (short responses)

For planning:

  • Model: phi3:3.8b-mini-4k-instruct-q4_K_M
  • Temperature: 0.1
  • Tokens: 300 (structured JSON output)

For chat:

  • Model: Any phi3 variant or larger models
  • Temperature: 0.7 (balanced creativity)
  • Tokens: 2000 (longer conversations)

GPU Acceleration

If you have an NVIDIA GPU:

  1. Verify GPU is detected:

    ollama ps  # Check PROCESSOR column shows "gpu"
  2. If CPU-only, install/update CUDA drivers:

Managing Disk Space

Models can consume significant disk space:

# List installed models with sizes
ollama list

# Remove unused models
ollama rm <model-name>

# Example: Remove old model
ollama rm gemma2:2b

Remote Ollama Setup (Advanced)

Server Setup

On your server machine:

# Allow external connections
export OLLAMA_HOST=0.0.0.0:11434

# Start Ollama
ollama serve
Security Note

Only expose Ollama on trusted networks. It has no authentication.

Client Configuration

On CommandLane machine:

  1. Settings → AI Settings → Advanced
  2. Set Ollama Base URL: http://<server-ip>:11434
  3. Save and verify connection

Next Steps

  • Explore the Dashboard → Integrations page for connection status
  • Test classification by capturing some text
  • Try the planning feature with Ctrl+Shift+P (customize tasks)
  • Use the Ask feature for chat interactions

Additional Resources