1. What is DeepSeek V4?
DeepSeek V4 (model name: deepseek-chat) is the fourth-generation flagship model from DeepSeek AI, a Chinese research lab that shocked the AI world in early 2025 by releasing GPT-4-class models at a fraction of the training cost.
V4 continues that tradition: it delivers performance comparable to GPT-4o and Claude Sonnet 4 on most developer benchmarks, while pricing starts at $0.27 per million input tokens — roughly 10x cheaper than GPT-4o.
Key facts:
- Architecture: Mixture-of-Experts (MoE) — activates only a fraction of parameters per forward pass
- Context window: 128K tokens
- API compatibility: OpenAI-compatible (drop-in replacement)
- Modalities: Text only (no native vision in
deepseek-chat) - Availability: Cloud API + open weights for self-hosting
deepseek-chat always maps to the latest stable release. V4 is what's behind it as of April 2026.
2. Benchmarks: How Good Is It?
| Benchmark | DeepSeek V4 | GPT-4o | Claude Sonnet 4 | Gemini 2.5 Pro |
|---|---|---|---|---|
| MMLU (knowledge) | 88.5% | 88.7% | 88.3% | 89.1% |
| HumanEval (coding) | 90.2% | 90.2% | 92.0% | 87.8% |
| MATH (math reasoning) | 84.1% | 76.6% | 78.3% | 86.5% |
| GPQA (science) | 59.1% | 53.6% | 65.0% | 62.2% |
| SWE-bench Verified | 42.0% | 38.0% | 49.0% | 35.0% |
Takeaway: DeepSeek V4 matches or beats GPT-4o on coding and math tasks. Claude Sonnet leads on complex reasoning and agentic tasks. Gemini excels on long-context. For pure price-performance, DeepSeek wins decisively.
3. Pricing — The Real Story
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Cache Hit Discount |
|---|---|---|---|
| DeepSeek V4 (deepseek-chat) | $0.27 | $1.10 | $0.07 input |
| DeepSeek R1 (reasoning) | $0.55 | $2.19 | $0.14 input |
| GPT-4o | $2.50 | $10.00 | 50% via Batch API |
| Claude Sonnet 4 | $3.00 | $15.00 | 90% prompt cache |
| Gemini 2.5 Pro | $1.25 | $10.00 | 75% context cache |
At $0.27/1M input tokens, a typical agent workflow processing 500K tokens per day costs $4/month — vs $37.50/month with GPT-4o. The savings compound fast at scale.
4. API Quickstart (5 Minutes)
Get your API key
Sign up at platform.deepseek.com → API Keys → Create Key. New accounts get free credits.
Python — basic chat
pip install openai # DeepSeek uses OpenAI SDK
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain MoE architecture in 3 sentences."}
],
temperature=0.7,
max_tokens=512
)
print(response.choices[0].message.content)
Streaming response
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Write a Python quicksort."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Function calling (tool use)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}]
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto"
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name) # get_weather
print(tool_call.function.arguments) # {"city": "Tokyo"}
api_key and base_url. Everything else stays the same.
5. Migration: Deprecated Endpoints (⚠️ July 24 Deadline)
Which endpoints are being deprecated?
| Deprecated (sunset July 24) | Replacement |
|---|---|
deepseek-chat-v3 | deepseek-chat |
deepseek-chat-v3-0324 | deepseek-chat |
deepseek-reasoner-r1 | deepseek-reasoner |
deepseek-reasoner-r1-0528 | deepseek-reasoner |
Migration script (scan your codebase)
# Scan all .py files for deprecated model names
import os, re
DEPRECATED = [
"deepseek-chat-v3",
"deepseek-chat-v3-0324",
"deepseek-reasoner-r1",
"deepseek-reasoner-r1-0528",
]
def scan_dir(path):
for root, _, files in os.walk(path):
for f in files:
if f.endswith(".py"):
fpath = os.path.join(root, f)
content = open(fpath).read()
for dep in DEPRECATED:
if dep in content:
print(f"FOUND: {dep} in {fpath}")
scan_dir(".") # run from your project root
Bulk replace with sed (Linux/macOS)
# Replace deprecated chat model
find . -name "*.py" -exec sed -i \
's/deepseek-chat-v3[^"'"'"']*/deepseek-chat/g' {} \;
# Replace deprecated reasoner model
find . -name "*.py" -exec sed -i \
's/deepseek-reasoner-r1[^"'"'"']*/deepseek-reasoner/g' {} \;
deepseek/deepseek-chat (not deepseek/deepseek-chat-v3). LiteLLM passes the model name directly to the DeepSeek API.
6. Run DeepSeek Locally with Ollama
For privacy-sensitive workloads, offline use, or zero API cost experimentation, you can run DeepSeek models locally. The distilled variants are small enough for consumer hardware.
Hardware requirements
| Model | Size | Min VRAM | Notes |
|---|---|---|---|
| deepseek-r1:1.5b | ~1.1 GB | 4 GB | Fast, basic reasoning |
| deepseek-r1:7b | ~4.7 GB | 8 GB | Good balance |
| deepseek-r1:14b | ~9 GB | 16 GB | Near cloud quality |
| deepseek-r1:32b | ~20 GB | 24 GB | Strongest local option |
| deepseek-r1:70b | ~43 GB | 48 GB | Workstation / multi-GPU |
Setup with Ollama
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull and run DeepSeek R1 7B
ollama pull deepseek-r1:7b
ollama run deepseek-r1:7b
# Use via OpenAI-compatible API (Ollama exposes port 11434)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # any non-empty string
)
response = client.chat.completions.create(
model="deepseek-r1:7b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
7. Building AI Agents with DeepSeek V4
DeepSeek V4 supports all the primitives needed for agentic applications: tool calling, structured output, long context, and streaming. Here's a minimal agent loop:
import json
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}
]
def run_agent(user_input: str, max_turns: int = 5):
messages = [{"role": "user", "content": user_input}]
for turn in range(max_turns):
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools,
tool_choice="auto"
)
msg = response.choices[0].message
messages.append(msg)
# No tool call = final answer
if not msg.tool_calls:
return msg.content
# Execute each tool call
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
# Replace with your actual tool implementation
result = f"[Search result for: {args['query']}]"
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": result
})
return "Max turns reached"
answer = run_agent("What are the top AI agent frameworks in 2026?")
print(answer)
Using DeepSeek V4 with LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="deepseek-chat",
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com",
temperature=0
)
# Works with all LangChain agents, chains, and tools
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_tool_calling_agent(llm, tools=[], prompt=prompt)
executor = AgentExecutor(agent=agent, tools=[], verbose=True)
result = executor.invoke({"input": "Summarize DeepSeek V4 features"})
8. Limitations to Know Before You Commit
- No vision support in
deepseek-chat. If you need image understanding, use GPT-4o or Gemini. - No audio/video modalities — text only.
- No official SLA for uptime. For mission-critical apps, consider a fallback provider via LiteLLM or Portkey.
- Rate limits are lower than OpenAI's commercial tiers — check platform.deepseek.com for current limits.
- Latency can be higher than GPT-4o on complex requests due to MoE routing overhead.
- Data residency: requests are processed on DeepSeek's infrastructure in China. For GDPR-sensitive EU data, consider Mistral or on-premise deployment instead.
response = litellm.completion(
model="deepseek/deepseek-chat",
messages=messages,
fallbacks=["gpt-4o-mini", "claude-haiku-3-5"]
)
9. Verdict: When to Use DeepSeek V4
| Use Case | DeepSeek V4? | Notes |
|---|---|---|
| Text-only agents, chatbots | ✅ Best choice | 10x cheaper than GPT-4o, same quality |
| Coding assistants | ✅ Excellent | HumanEval 90%+, great function calling |
| High-volume production | ✅ Yes | Set up a fallback for reliability |
| Math / reasoning tasks | ✅ Strong | Use deepseek-reasoner for hard math |
| Vision / multimodal | ❌ No | Use GPT-4o or Gemini Flash |
| 1M+ token context | ❌ No | Use Gemini 2.5 Pro (128K → need more?) |
| GDPR / EU data sovereignty | ⚠️ Caution | Use Mistral or on-premise DeepSeek |
| Enterprise SLA required | ⚠️ Caution | Pair with LiteLLM fallback routing |
Bottom line: DeepSeek V4 is the best-value LLM API available in 2026 for text-based tasks. If your workload doesn't need vision, a 1M+ context window, or EU data residency, there's almost no reason to pay 10x more for GPT-4o on equivalent quality.
Find DeepSeek, LiteLLM, Portkey, and 420+ other AI agent tools at AgDex.ai — the most comprehensive AI tools directory for developers.