- September 03, 2025

Ollama: Local Model Deployment Guide

Ollama

Ollama is an open-source platform for running and managing large language models (LLMs) locally. It provides a unified interface to download, install, and serve a variety of models without requiring cloud access. Under the hood, Ollama handles model caching, quantization formats, and hardware optimizations so you can focus on building applications rather than wrestling with deployment details.
It’s commonly used to power offline or on-premises AI services where data privacy, latency, or cost constraints make cloud APIs impractical. Developers interact with Ollama through a simple HTTP or gRPC endpoint, sending prompt payloads and receiving streaming or batch completions. By abstracting away the complexities of model hosting, Ollama lets you seamlessly swap between different model families, experiment with compression techniques, and integrate LLM inference into Python scripts, microservices, or containerized pipelines.

Step 1: Install Ollama

Windows

Go to Ollama Downloads
Download the Windows installer .exe
Run the installer and follow prompts
Open PowerShell and verify:
ollama --version
Linux
Run this in your terminal:
curl -fsSL https://get.ollama.com | sh
Then verify:
ollama --version

Download Models Using Ollama

Download qwen2.5:3b (Chat Model)

```
ollama pull qwen2.5:3b
```
This fetches the model weights and configuration into your local Ollama cache.

Download mxbai-embed-large:335m (Embedding Model)

```
ollama pull mxbai-embed-large:335m
```

Start the Ollama Server Locally

Once installed, start the server:
```
ollama serve
```
This launches a local API that handles model loading, chat, and embedding requests.
You can keep this terminal open or run Ollama as a background service.

Model Storage Locations

Windows: C:\Users\<YourUser>\.ollama\models\
Linux: ~/.ollama/models/

Comments

Search This Blog

Human Side of Tech

Ollama: Local Model Deployment Guide

Ollama

Step 1: Install Ollama

Windows

Go to Ollama Downloads
Download the Windows installer .exe
Run the installer and follow prompts
Open PowerShell and verify:
ollama --version
Linux
Run this in your terminal:
curl -fsSL https://get.ollama.com | sh
Then verify:
ollama --version

Download Models Using Ollama

Download qwen2.5:3b (Chat Model)

```
ollama pull qwen2.5:3b
```
This fetches the model weights and configuration into your local Ollama cache.

Download mxbai-embed-large:335m (Embedding Model)

```
ollama pull mxbai-embed-large:335m
```

Start the Ollama Server Locally

Once installed, start the server:
```
ollama serve
```
This launches a local API that handles model loading, chat, and embedding requests.
You can keep this terminal open or run Ollama as a background service.

Model Storage Locations

Windows: C:\Users\<YourUser>\.ollama\models\
Linux: ~/.ollama/models/

Comments

Post a Comment

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives

Ollama: Local Model Deployment Guide

Ollama

Step 1: Install Ollama

Windows

Go to Ollama DownloadsDownload the Windows installer .exeRun the installer and follow promptsOpen PowerShell and verify: ollama --version LinuxRun this in your terminal:curl -fsSL https://get.ollama.com | shThen verify:ollama --version

Download Models Using Ollama

Download qwen2.5:3b (Chat Model)

```ollama pull qwen2.5:3b```This fetches the model weights and configuration into your local Ollama cache.

Download mxbai-embed-large:335m (Embedding Model)

```ollama pull mxbai-embed-large:335m```

Start the Ollama Server Locally

Once installed, start the server:```ollama serve```This launches a local API that handles model loading, chat, and embedding requests.You can keep this terminal open or run Ollama as a background service.

Model Storage Locations

Windows: C:\Users\<YourUser>\.ollama\models\Linux: ~/.ollama/models/

Comments

Post a Comment

Popular posts from this blog

Go to Ollama Downloads
Download the Windows installer .exe
Run the installer and follow prompts
Open PowerShell and verify:
ollama --version
Linux
Run this in your terminal:
curl -fsSL https://get.ollama.com | sh
Then verify:
ollama --version

```
ollama pull qwen2.5:3b
```
This fetches the model weights and configuration into your local Ollama cache.

```
ollama pull mxbai-embed-large:335m
```

Once installed, start the server:
```
ollama serve
```
This launches a local API that handles model loading, chat, and embedding requests.
You can keep this terminal open or run Ollama as a background service.

Windows: C:\Users\<YourUser>\.ollama\models\
Linux: ~/.ollama/models/