Self-hosted model with Ollama
Ollama runs large language models on your own machine. Use it when you want Curiosity's AI features to run against a local model so that prompts and documents never leave your network — air-gapped deployments, strict data-residency requirements, or development without an API key.
This page covers installing Ollama, downloading a model, and wiring Ollama up as an LLM provider in Curiosity. It maps to the Ollama quickstart; when in doubt, the upstream docs are the source of truth.
How it fits with the rest of AI Settings
Ollama is one of several providers you can configure under Settings → AI Settings. For the full list of providers and the policy decisions (token caps, timeouts, fallback), see LLM Configuration. This page is the deep-dive on the Ollama-specific setup.
Before you start
Ollama runs the model on the CPU or GPU of whatever machine it is installed on. Size that machine to the model you want to run:
| Model size | Minimum RAM | Notes |
|---|---|---|
| 1B–3B | 8 GB | Fast on CPU. Good for tool-call-heavy chat and low-latency tasks. |
| 7B–8B | 8–16 GB | The common default. Runs on CPU; much faster with a GPU. |
| 13B–14B | 16 GB | Noticeably slower on CPU-only machines. |
| 33B–34B | 32 GB | A GPU is strongly recommended. |
| 70B+ | 64 GB | Needs a GPU box (or a lot of patience). |
A discrete GPU with enough VRAM to hold the model is the single biggest factor in response speed. CPU-only works but is slow for anything above the 7B class.
Step 1 — Install Ollama
Pick your operating system.
Download and run the installer from ollama.com/download, or install from the terminal:
curl -fsSL https://ollama.com/install.sh | sh
The macOS app starts Ollama in the background and adds it to your menu bar.
Step 2 — Confirm Ollama is running
Ollama exposes a local HTTP API on port 11434. Verify it responds:
curl http://localhost:11434
You should get back Ollama is running. This URL — http://localhost:11434 — is the Ollama Host you will give Curiosity in Step 4.
If curl fails, start the server with ollama serve. If the port is already in use, another Ollama instance is likely already running.
Step 3 — Install a model
Ollama does not ship with a model — you download the ones you need from the Ollama model library. Download a model with ollama pull:
ollama pull llama3.1
To download and immediately chat with a model in the terminal (this also pulls it if it is missing), use ollama run:
ollama run llama3.1
Type /bye to leave the chat. A few commonly used models:
| Model | Pull command | Good for |
|---|---|---|
| Llama 3.1 (8B) | ollama pull llama3.1 |
General-purpose chat, tool calling. |
| Llama 3.2 (1B / 3B) | ollama pull llama3.2:1b |
Low-latency, low-memory tasks. |
| Gemma 3 | ollama pull gemma3 |
General-purpose, range of sizes. |
| Qwen 2.5 | ollama pull qwen2.5 |
Strong multilingual and coding. |
| Mistral (7B) | ollama pull mistral |
Compact general-purpose model. |
| Phi-4 | ollama pull phi4 |
Small, capable reasoning model. |
Model names can carry a size or quantization tag after a colon, e.g. llama3.1:70b or llama3.2:1b. Without a tag you get the default variant. Browse ollama.com/library for the current catalog and exact tags.
Tool-calling support
Curiosity's AI assistant uses tools to query your graph. Pick a model that supports tool/function calling (most current Llama, Qwen, and Mistral models do) — older or very small models may chat fine but fail to invoke tools. The model library page lists a Tools capability tag for models that support it.
Managing downloaded models
| Command | What it does |
|---|---|
ollama list |
List the models you have downloaded. |
ollama ps |
Show models currently loaded in memory. |
ollama show llama3.1 |
Show a model's parameters, context length, and license. |
ollama rm llama3.1 |
Delete a downloaded model to reclaim disk space. |
Step 4 — Add Ollama as a provider in Curiosity
With Ollama running and at least one model pulled, connect it to Curiosity:
Open AI Settings
In your workspace, go to Settings → AI Settings.
Add the Ollama provider
Click Add provider and choose Ollama.
Set the Ollama Host
In the Ollama Host field, enter the address of your Ollama server. If Ollama runs on the same machine as the workspace, this is the default:
http://localhost:11434
If Ollama runs on a different machine, point this at that host instead — see Running Ollama on a separate machine below.
Pick a model
Open the Model dropdown. Curiosity queries your Ollama server and lists every model you have downloaded. Select the one you want the workspace to use.
If the dropdown shows "No Models Downloaded", go back to Step 3 and pull a model — Curiosity can only list models that already exist on the Ollama host.
Save and set as default
Save the provider. To make it the workspace default chat model, mark it as the default provider. Use the provider's Test/Check button to confirm the workspace can reach Ollama and run the selected model.
Because the model runs locally, no workspace data is sent to a cloud provider when you use this model. Expect higher CPU/RAM (or GPU) usage on the Ollama host while AI features are in use.
Running Ollama on a separate machine
By default Ollama only listens on localhost, so it is reachable only from the same machine. To run the model on a dedicated server (for example a GPU box) and point the workspace at it over the network, bind Ollama to all interfaces with the OLLAMA_HOST environment variable:
Edit the service with systemctl edit ollama and add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Then reload and restart:
systemctl daemon-reload
systemctl restart ollama
Then set the Ollama Host in Curiosity to that machine's address, e.g. http://ollama.internal:11434 or http://192.168.1.50:11434.
Don't expose Ollama to the open internet
Binding to 0.0.0.0 makes Ollama reachable from the network, and it has no authentication of its own. Keep it on a private network and restrict access with a firewall, reverse proxy, or VPN. Only the Curiosity workspace should be able to reach it.
Embeddings
Ollama serves chat models. Curiosity's vector search needs an embedding model, which is configured separately. You can either run a local embedding model in Ollama (e.g. ollama pull nomic-embed-text) and use it through an embedding provider, or use Curiosity's built-in local embeddings. See the embeddings options in LLM Configuration.
Troubleshooting
| Symptom | Likely cause and fix |
|---|---|
| "No Models Downloaded" in the dropdown | No model has been pulled on that host. Run ollama pull <model> on the Ollama machine, then reopen the dropdown. |
| Curiosity can't reach Ollama | Confirm curl http://<host>:11434 returns Ollama is running. Check the Ollama Host value, the firewall, and (for remote hosts) that OLLAMA_HOST is set to 0.0.0.0. |
| Model loads but the assistant can't use tools | The model doesn't support tool calling. Switch to a model with the Tools capability (see Step 3). |
| Responses are very slow | The machine is CPU-only or short on RAM/VRAM for the model size. Use a smaller model or add a GPU (see Before you start). |
| Out-of-memory when loading | The model is too large for the available RAM/VRAM. Pull a smaller variant, e.g. llama3.2:3b. |
Next steps
- Provider policies, token caps, and fallback: LLM Configuration
- What to use the model for: AI & LLMs Overview
- Writing effective prompts: Prompting Patterns