deepseek 相关模型介绍-谢先斌的博客

deepseek 相关模型介绍

发布时间： 2025-01-29 更新时间： 2025-10-19 总字数：1146 阅读时间：3m 作者：谢先斌 IP上海

deepseek 是中国的一家人工智能与大型语言模型公司

介绍

Deepseek-V3

通用模型，高效便捷，适用于绝大多数规范性任务

Deepseek-V3-0324 2025-03-25 发布，新版 V3 模型借鉴 DeepSeek-R1 模型训练过程中所使用的强化学习技术，大幅提高了在推理类任务上的表现水平，在数学、代码类相关评测集上取得了超过 GPT-4.5 的得分成绩
- DeepSeek-V3-0324 与之前的 DeepSeek-V3 使用同样的 base 模型，仅改进了后训练方法。私有化部署时只需要更新 checkpoint 和 tokenizer_config.json（tool calls 相关变动）。模型参数约 660B，开源版本上下文长度为 128K（网页端、App 和 API 提供 64K 上下文）
- 新版 V3 模型在工具调用（function calling，DeepSeek-V3内置实现，无需借助其他任何Agent开发工具或者MCP协议）、角色扮演、问答闲聊等方面也得到了一定幅度的能力提升
- Deepseek-V3-0324 = Deepseek-v3 + Deepseek-R1创造的高质量合成数据训练
- MIT License
DeepSeek-V3 为自研 MoE 模型，671B 参数，激活 37B，在 14.8T token 上进行了预训练，在长文本、代码、数学、百科、中文能力上表现优秀。

DeepSeek-R1

推理模型，复杂推理和深度分析任务，如数理逻辑推理和编程代码，开放性 任务
https://github.com/deepseek-ai/DeepSeek-R1 里面包括论文的 pdf 文档
DeepSeek-R1 在后训练阶段大规模使用了强化学习技术，在仅有极少标注数据的情况下，极大提升了模型推理能力。在数学、代码、自然语言推理等任务上，性能比肩 OpenAI o1 正式版
- 不适合、无法进行 Agent 开发（推理模型的通病，不适合进行 Agent 开发）
  - 不支持 function calling 功能（参考）
    - 上下文长度：API 最大支持 64K 上下文，输出的 reasoning_content 长度不计入 64K 上下文长度中
    - 不支持的功能：Function Call、Json Output、FIM 补全 (Beta)
  - 由于多次响应，R1 构建的智能体运行效率非常低
  - R1 模型存在幻觉，会严重影响准确率
- 在每一轮对话过程中，模型会输出思维链内容（reasoning_content）和最终回答（content）
- MIT License
开源模型，详情参考
- HuggingFace 链接： https://huggingface.co/deepseek-ai

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek-R1-Zero	671B	37B	128K	🤗 HuggingFace
DeepSeek-R1	671B	37B	128K	🤗 HuggingFace

DeepSeek-R1-Distill Models

Model	Base Model	Download
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-Math-7B	🤗 HuggingFace
DeepSeek-R1-Distill-Llama-8B	Llama-3.1-8B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	🤗 HuggingFace
DeepSeek-R1-Distill-Llama-70B	Llama-3.3-70B-Instruct	🤗 HuggingFace

量化的模型：

https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF 特点：快，内存占用少（约 60%）

deepseek 接口

deepseek-chat 模型已全面升级为 DeepSeek-V3，接口不变
deepseek-reasoner 是 DeepSeek 最新推出的推理模型 DeepSeek-R1

本地部署

DeepSeek 模型本地部署资源

F&Q

推理模型的通病，不适合进行 Agent 开发的解决方案
- 将推理模型和 Agent 开发性能兼具的新形态大模型，如
  - Anthropic 的 Claude3.7（2025 年 2 月发布），是首款混合推理大模型，支持通过参数设置，在推理形态和对话形态中来回切换
  - OpenAI 计划在 GPT-5 中实现上述功能的合并
上下文拼接：在每一轮对话过程中，模型会输出思维链内容（reasoning_content）和最终回答（content）。在下一轮对话中，之前轮输出的思维链内容不会被拼接到上下文中，转自

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages
)

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

# Round 2
messages.append({'role': 'assistant', 'content': content})
messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"})
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages
)
# ...

参考

deepseek 相关模型介绍

介绍

Deepseek-V3

DeepSeek-R1

deepseek 接口

本地部署

F&Q

参考

Cookie Notice!