Tutorial April 25, 2026 ยท 10 min read

DeepSeek Complete Guide 2026: Official Links, API & Local Deployment

Everything in one place โ€” where to use DeepSeek online, how to set up the API, where to download models, and how to run DeepSeek locally with Ollama or LM Studio. Updated for V4.

1. Official Access

๐Ÿ’ฌ

Chat (Free)

chat.deepseek.com

Web interface, no sign-in required for basic use. Supports V4-Flash and V4-Pro with thinking mode toggle.

๐Ÿ”Œ

API Platform

platform.deepseek.com

API key management, usage dashboard, billing. Free tier available.

๐Ÿ“‚

GitHub (Source)

github.com/deepseek-ai

Open-source model weights, inference code, and research papers.

๐Ÿค—

Hugging Face

huggingface.co/deepseek-ai

Model weights download (GGUF, safetensors). Used by Ollama and LM Studio.

2. Model Lineup (V4, April 2026)

Model Params (Active) Context Best For License
deepseek-v4-pro 1.6T (49B MoE) 1M Agents, coding, complex reasoning MIT
deepseek-v4-flash 284B (13B MoE) 1M High-volume, low-latency tasks MIT
deepseek-r1 671B (37B MoE) 128K Math, science, step-by-step reasoning MIT
deepseek-r1-distill 1.5B / 7B / 14B / 32B 128K Local deployment (consumer GPU) MIT
deepseek-v3 671B (37B MoE) 128K Retiring โ€” migrate to V4 MIT
โš ๏ธ Migration Reminder: Legacy model names deepseek-chat and deepseek-reasoner retire July 24, 2026. Migrate to deepseek-v4-flash or deepseek-v4-pro.

3. API Setup (5 Minutes)

DeepSeek uses an OpenAI-compatible API โ€” same request format, different base URL.

Step 1 โ€” Get an API Key

  1. Go to platform.deepseek.com
  2. Sign up / log in
  3. Navigate to API Keys โ†’ Create API Key
  4. Copy the key (shown once โ€” save it now)

Step 2 โ€” Python (openai library)

pip install openai
from openai import OpenAI

client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Hello, DeepSeek!"}
]
)

print(response.choices[0].message.content)

Step 3 โ€” Enable Thinking Mode (V4-Pro)

response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Solve step by step..."}],
# Enable extended thinking
extra_body={"thinking": {"type": "enabled"}}
)

# Access thinking content
thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

4. Model Downloads

DeepSeek models are open-source under MIT license. Three main sources:

๐Ÿค— Hugging Face (Recommended)

huggingface.co/deepseek-ai

Full model weights in safetensors format. Use for vLLM, Transformers, custom inference.

pip install huggingface_hub
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash --local-dir ./models/deepseek-v4-flash

๐Ÿฆ™ Ollama Library (Easiest for Local)

ollama.com/library/deepseek-v3

Pre-quantized GGUF format. One command to download and run. Best for consumer hardware.

ollama pull deepseek-v3:latest
# R1 distill variants (smaller)
ollama pull deepseek-r1:7b
ollama pull deepseek-r1:14b

๐ŸงŠ ModelScope (China Mirror)

modelscope.cn/deepseek-ai

Faster downloads from mainland China. Same weights as HuggingFace.

5. Local Deployment: Ollama

Ollama is the easiest way to run DeepSeek locally. Works on macOS, Linux, and Windows.

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download installer from ollama.com

Pull and Run DeepSeek

# DeepSeek R1 distill โ€” best for consumer GPUs
ollama run deepseek-r1:7b
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b

# DeepSeek V3 (full, needs high VRAM)
ollama run deepseek-v3:latest

Use via OpenAI-Compatible API

Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434:

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # any string
)

response = client.chat.completions.create(
model="deepseek-r1:14b",
messages=[{"role": "user", "content": "Hello!"}]
)

Use with LangChain

from langchain_ollama import ChatOllama

llm = ChatOllama(model="deepseek-r1:14b")
result = llm.invoke("Explain MCP protocol in 3 sentences")

6. Local Deployment: LM Studio

LM Studio provides a GUI for downloading and running models. Best for non-technical users or Windows users who prefer a desktop app.

  1. Download LM Studio from lmstudio.ai
  2. Open the app โ†’ Search for "deepseek" in the Discover tab
  3. Select a model variant (e.g., DeepSeek-R1-Distill-Qwen-14B-GGUF)
  4. Click Download โ€” wait for completion (several GB)
  5. Load the model โ†’ Chat directly in the UI
  6. Or enable the Local Server (port 1234) for API access
# LM Studio local server โ€” OpenAI compatible
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
response = client.chat.completions.create(
model="deepseek-r1-distill-qwen-14b",
messages=[{"role": "user", "content": "..."}]
)

7. Local Deployment: Docker + vLLM

For production-grade local deployment with GPU, vLLM gives you the best performance.

# Pull the vLLM Docker image
docker pull vllm/vllm-openai:latest

# Run DeepSeek R1 (14B distill) โ€” requires ~30GB VRAM
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
vllm/vllm-openai:latest \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--tensor-parallel-size 1

# API available at http://localhost:8000/v1
# Or use docker-compose for persistent setup
version: '3.8'
services:
deepseek:
image: vllm/vllm-openai:latest
runtime: nvidia
environment:
- HUGGING_FACE_HUB_TOKEN=your_hf_token
volumes:
- hf_cache:/root/.cache/huggingface
ports:
- "8000:8000"
command: ["--model", "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"]

8. Hardware Requirements

Model VRAM (Full) VRAM (Q4 Quant) Recommended GPU
R1 Distill 7B 14 GB 5 GB RTX 3060 / M2 Pro
R1 Distill 14B 28 GB 10 GB RTX 3090 / M2 Max
R1 Distill 32B 64 GB 22 GB RTX 4090 / A100 40G
V3 / R1 671B ~1.3 TB 400+ GB Multi-GPU server (H100 ร—8)

๐Ÿ’ก Practical tip: For most developers, R1 Distill 14B (Q4) is the sweet spot โ€” runs on a single RTX 3090 or M2 Max, competitive reasoning quality, fast enough for development work.

9. FAQ

Is DeepSeek free to use?

Chat at chat.deepseek.com is free with rate limits. API has a free tier with monthly credits; paid pricing is among the lowest of any frontier model ($0.07/M input tokens for Flash, $0.27/M for Pro as of April 2026). Model weights are MIT licensed โ€” free to download and self-host.

Is DeepSeek V4 open source?

Yes โ€” both V4-Pro and V4-Flash are released under the MIT license. Weights available on Hugging Face and ModelScope. You can fine-tune, deploy commercially, and modify without restriction.

What's the difference between V4 and R1?

V4 is the general-purpose conversational + coding model. R1 is a reasoning-specialized model trained with reinforcement learning โ€” better at math, logic, and step-by-step problem solving, but slower. For most agent use cases, V4 is the right choice.

Can I use DeepSeek with LangChain / CrewAI / AutoGen?

Yes โ€” all major frameworks support DeepSeek via the OpenAI-compatible API. Just set base_url="https://api.deepseek.com" and your DeepSeek API key. For local Ollama deployments, point to http://localhost:11434/v1.

What happens to deepseek-chat / deepseek-reasoner after July 24?

They will be fully deprecated. Requests using these model names will return an error. Migrate to deepseek-v4-flash (equivalent to old deepseek-chat) or deepseek-v4-pro (upgraded from deepseek-reasoner).

Find DeepSeek and 400+ AI agent tools, LLM APIs, and frameworks at AgDex.ai.