Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions examples/localcowork/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
# ─── Model Configuration ─────────────────────────────────────────────────────

# Directory containing GGUF model files (downloaded from HuggingFace)
# IMPORTANT: Store models OUTSIDE the project repo (e.g., ~/Projects/_models).
# The huggingface_hub library creates .cache/huggingface for tracking downloads,
# which should not be in the project directory.
# LOCALCOWORK_MODELS_DIR=~/Projects/_models

# Text model API endpoint (OpenAI-compatible). Set by start-model.sh.
Expand Down
1 change: 1 addition & 0 deletions examples/localcowork/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ src-tauri/gen/
_models/*.gguf
_models/*.bin
_models/*.safetensors
_models/.cache/

# ─── IDE ──────────────────────────────────────────────────────────────────
.vscode/
Expand Down
84 changes: 61 additions & 23 deletions examples/localcowork/_models/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,20 @@
# Set LOCALCOWORK_MODELS_DIR env var to override.
# Ollama-managed models use Ollama's own storage (~/.ollama/models/).
#
# IMPORTANT: When downloading models via huggingface_hub, use your home
# directory (default) rather than this directory. The huggingface_hub
# library creates a .cache/huggingface folder for tracking downloads,
# which can grow large and should not be in the project repo.
#
# Example to download to your models directory:
# from huggingface_hub import hf_hub_download
# hf_hub_download('LiquidAI/LFM2-24B-A2B', 'LFM2-24B-A2B-Q4_K_M.gguf',
# local_dir='${HOME}/Projects/_models')
#
# Model paths below use ${LOCALCOWORK_MODELS_DIR} for interpolation.
# The config-loader resolves environment variables at load time.

active_model: lfm2-24b-a2b # Sparse MoE: 24B total, 2.3B active, 64 experts top-4 — 80% tool accuracy
active_model: lfm2-24b-a2b # Sparse MoE: 24B total, 2.3B active, 64 experts top-4 — 80% tool accuracy

# Default model directory for non-Ollama model files (GGUF, MLX, etc.)
models_dir: "${LOCALCOWORK_MODELS_DIR:-~/Projects/_models}"
Comment on lines +23 to 26
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a large number of whitespace-only edits (e.g., changing inline-comment spacing) throughout the config. This makes the diff noisier and increases the chance of merge conflicts without affecting behavior. Consider reverting purely-formatting changes and keeping this PR focused on the HuggingFace cache/gitignore fix.

Copilot uses AI. Check for mistakes.
Expand Down Expand Up @@ -76,11 +86,11 @@ models:
model_name: "gpt-oss:20b"
base_url: "http://localhost:11434/v1"
context_window: 32768
tool_call_format: native_json # Native function calling + structured outputs
tool_call_format: native_json # Native function calling + structured outputs
temperature: 0.7
max_tokens: 4096
estimated_vram_gb: 14
force_json_response: false # Enable after live testing — triggers GBNF grammar enforcement
force_json_response: false # Enable after live testing — triggers GBNF grammar enforcement
capabilities:
- text
- tool_calling
Expand All @@ -92,7 +102,7 @@ models:
model_path: "${LOCALCOWORK_MODELS_DIR:-~/Projects/_models}/lfm25-24b-q4_k_m.gguf"
base_url: "http://localhost:8080/v1"
context_window: 32768
tool_call_format: pythonic # LFM2.5 uses Pythonic calls; normalizer converts to JSON
tool_call_format: pythonic # LFM2.5 uses Pythonic calls; normalizer converts to JSON
temperature: 0.7
max_tokens: 4096
estimated_vram_gb: 14
Expand All @@ -102,20 +112,20 @@ models:

# LFM2-24B-A2B — Liquid AI's MoE hybrid model (private preview)
# Architecture: 24B total, 2.3B active per token, 64 experts top-4, 40 layers (1:3 attn:conv ratio)
# Download GGUF from: https://huggingface.co/LiquidAI/LFM2-24B-A2B-Preview (gated — request access)
# Download GGUF from: https://huggingface.co/LiquidAI/LFM2-24B-A2B (gated — request access)
# Benchmark plan: docs/model-analysis/lfm2-24b-a2b-benchmark.md
# Run: llama-server --model <path> --port 8080 --ctx-size 32768 --n-gpu-layers 99 --flash-attn
Comment on lines 113 to 117
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LFM2-24B-A2B comments and download URL were changed from the ...-Preview repo to LiquidAI/LFM2-24B-A2B, but the rest of this repo still references the ...-Preview location/filename (e.g., scripts/start-model.sh, scripts/setup-dev.sh, and README.md). As-is, contributors following the scripts/docs will likely download a different filename/repo than this config implies. Please either update the related scripts/docs in the same PR, or keep the config pointing at the ...-Preview repo to stay consistent.

Copilot uses AI. Check for mistakes.
lfm2-24b-a2b:
display_name: "LFM2-24B-A2B-Preview"
display_name: "LFM2-24B-A2B"
runtime: llama_cpp
model_path: "${LOCALCOWORK_MODELS_DIR:-~/Projects/_models}/LFM2-24B-A2B-Preview-Q4_K_M.gguf"
model_path: "${LOCALCOWORK_MODELS_DIR:-~/Projects/_models}/LFM2-24B-A2B-Q4_K_M.gguf"
Comment on lines 118 to +121
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the same model entry, the display_name/model_path were updated to drop -Preview, but other tooling in the repo still expects LFM2-24B-A2B-Preview-Q4_K_M.gguf (scripts + docs). If this filename change is intentional, the surrounding ecosystem needs to be updated together; otherwise, users will end up with a downloaded file that doesn’t match what the config suggests.

Copilot uses AI. Check for mistakes.
base_url: "http://localhost:8080/v1"
context_window: 32768
tool_call_format: bracket # LFM2 bracket format: [server.tool(args)] parsed by tool_call_parser.rs
tool_call_format: bracket # LFM2 bracket format: [server.tool(args)] parsed by tool_call_parser.rs
temperature: 0.7
tool_temperature: 0.1 # Lower temperature for tool-calling turns (ADR-008 Layer 3)
tool_temperature: 0.1 # Lower temperature for tool-calling turns (ADR-008 Layer 3)
max_tokens: 4096
estimated_vram_gb: 16 # Q4_K_M quantization estimate for 24B MoE
estimated_vram_gb: 16 # Q4_K_M quantization estimate for 24B MoE
capabilities:
- text
- tool_calling
Expand All @@ -130,7 +140,7 @@ models:
tool_call_format: native_json
temperature: 0.7
max_tokens: 4096
estimated_vram_gb: 4 # Only ~3B active params
estimated_vram_gb: 4 # Only ~3B active params
capabilities:
- text
- tool_calling
Expand All @@ -152,7 +162,7 @@ models:
tool_call_format: native_json
temperature: 0.1
max_tokens: 4096
estimated_vram_gb: 1.8 # Q8_0 model (1.25 GB) + mmproj (583 MB)
estimated_vram_gb: 1.8 # Q8_0 model (1.25 GB) + mmproj (583 MB)
capabilities:
- text
- vision
Expand All @@ -170,10 +180,10 @@ models:
model_path: "${LOCALCOWORK_MODELS_DIR:-~/Projects/_models}/LFM2.5-1.2B-Router-FT-v2-Q8_0.gguf"
base_url: "http://localhost:8082/v1"
context_window: 32768
tool_call_format: bracket # LFM2.5 bracket format: [server.tool(args)]
tool_call_format: bracket # LFM2.5 bracket format: [server.tool(args)]
temperature: 0.1
max_tokens: 512
estimated_vram_gb: 1.5 # Q8_0 quantization (1.2 GB)
estimated_vram_gb: 1.5 # Q8_0 quantization (1.2 GB)
role: tool_router
fine_tuned:
method: lora
Expand All @@ -198,7 +208,7 @@ models:
model_path: "${LOCALCOWORK_MODELS_DIR:-~/Projects/_models}/LFM2.5-1.2B-Instruct-F16.gguf"
base_url: "http://localhost:8084/v1"
context_window: 32768
tool_call_format: bracket # LFM2.5 bracket format: [server.tool(args)]
tool_call_format: bracket # LFM2.5 bracket format: [server.tool(args)]
temperature: 0.1
max_tokens: 512
estimated_vram_gb: 2.3
Expand Down Expand Up @@ -238,6 +248,24 @@ models:
- text
- tool_calling

# LM Studio headless server — any model loaded in LM Studio
# Run `lms server start` or enable "Run LLM server on login" in app settings.
# Default port is 1234. Uses OpenAI-compatible API.
# Note: The model_name here is informational - update to match your loaded model.
lmstudio-default:
display_name: "LM Studio (Default)"
runtime: lmstudio
model_name: "liquid/lfm2-24b-a2b" # Replace with your loaded model ID
Comment on lines +251 to +258
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new lmstudio-default model and runtime: lmstudio addition is a functional/config expansion that isn’t mentioned in the PR description (which focuses on HuggingFace cache + gitignore). If this is intended, please call it out in the PR summary; otherwise consider moving it to a separate PR to keep scope tight and reduce review/rollback risk.

Copilot uses AI. Check for mistakes.
base_url: "http://localhost:1234/v1"
context_window: 32768
tool_call_format: native_json
temperature: 0.7
max_tokens: 4096
estimated_vram_gb: null # Varies by loaded model
capabilities:
- text
- tool_calling

# ─── Benchmark comparison models ────────────────────────────────────────
# These models are benchmarked against LFM2-24B-A2B to demonstrate
# scaling efficiency of hybrid MoE conv+attn vs dense and standard MoE.
Expand Down Expand Up @@ -290,7 +318,7 @@ models:
tool_temperature: 0.1
max_tokens: 4096
estimated_vram_gb: 20
deprecated: true # Partial run only (40/100), dropped from active benchmarks
deprecated: true # Partial run only (40/100), dropped from active benchmarks
capabilities:
- text
- tool_calling
Expand Down Expand Up @@ -358,7 +386,9 @@ models:
- text
- tool_calling

# Runtime configurations
# Runtime configurations (informational only — not used by the app)
# These describe how to start each runtime for reference. The app
# expects the runtime to already be running when it starts.
runtimes:
ollama:
command: "ollama serve"
Expand All @@ -371,6 +401,14 @@ runtimes:
health_check: "http://localhost:8080/health"
startup_timeout_seconds: 60

lmstudio:
# Use `lms server start` CLI to start headless, or enable "Run LLM server
# on login" in app settings (Cmd/Ctrl+,). Default port is 1234.
command: "lms"
args: ["server", "start"]
health_check: "http://localhost:1234/v1/models"
startup_timeout_seconds: 30

mlx:
command: "mlx_lm.server"
args: ["--model", "{model_path}", "--port", "8080"]
Expand All @@ -380,9 +418,9 @@ runtimes:

# Fallback chain — used when the active model is unavailable
fallback_chain:
- lfm2-24b-a2b # Primary — 78% single-step, 24% chain completion
- qwen3-30b-moe # Fallback 1 — Ollama-hosted Qwen3 MoE
- static_response # Fallback 2 — hardcoded "model unavailable" message
- lfm2-24b-a2b # Primary — 78% single-step, 24% chain completion
- qwen3-30b-moe # Fallback 1 — Ollama-hosted Qwen3 MoE
- static_response # Fallback 2 — hardcoded "model unavailable" message

# Dual-model orchestrator (ADR-009)
# When enabled, the planner model decomposes multi-step workflows and
Expand All @@ -402,9 +440,9 @@ fallback_chain:
# to skip the orchestrator entirely and avoid the ~2-3s wasted planner call.
# See ADR-009 for full details.
orchestrator:
enabled: false # With 20 curated tools, single-model loop is faster. Enable for 67+ tools.
enabled: false # With 20 curated tools, single-model loop is faster. Enable for 67+ tools.
planner_model: lfm2-24b-a2b
router_model: lfm25-1.2b-router-ft # Fine-tuned V2: 93.0% eval accuracy, 83.7% live (83 tools)
router_model: lfm25-1.2b-router-ft # Fine-tuned V2: 93.0% eval accuracy, 83.7% live (83 tools)
router_top_k: 15
max_plan_steps: 10
step_retries: 3
Expand All @@ -414,4 +452,4 @@ orchestrator:
# ~15 category meta-tools (~1,500 tokens) instead of all 67 tools (~8,670 tokens).
# The model selects 2-3 categories, then subsequent turns use only those tools.
# Saves ~7,170 tokens per turn and eliminates cross-server confusion.
two_pass_tool_selection: true # Active only when >30 tools registered; 21 curated tools use flat mode
two_pass_tool_selection: true # Active only when >30 tools registered; 21 curated tools use flat mode
Loading