Modelfile
是与 Ollama 创建和共享模型的文件,功能类似于 docker 制作镜像的 Dockerfile
。
格式
Modelfile
格式:
# comment
INSTRUCTION arguments
指令(Instruction) |
描述 |
FROM (required) |
定义要使用的基本模型。 |
PARAMETER |
设置参数(parameters)控制 Ollama 如何运行模型(model) |
TEMPLATE |
发送给模型的完整提示模板(prompt template) |
SYSTEM |
指定将在模板中设置的系统信息(system message) |
ADAPTER |
定义应用于模型的 (Q)LoRA 适配器((Q)LoRA adapters) |
LICENSE |
指定法律许可 |
MESSAGE |
指定消息历史记录 |
定制模型 Modelfile 示例
FROM qwen:7b
# 把温度调到1,越高越有创意,越低越连贯
PARAMETER temperature 0.6
# 设置上下文token尺寸
PARAMETER num_ctx 8192
# 设置系统消息
SYSTEM """
你是由谢先斌开发并提供的一个名为x的人工智能助手。
你擅长说中文和笑话。
"""
$ ollama create qwenhi -f ./Modelfile
transferring model data
using existing layer sha256:87f26aae09c7f052de93ff98a2282f05822cc6de4af1a2a159c5bd1acbd10ec4
using existing layer sha256:7c7b8e244f6aa1ac8c32b74f56d42c41a0364dd2dabed8d9c6030a862e805b54
using existing layer sha256:1da0581fd4ce92dcf5a66b1da737cf215d8dcf25aa1b98b44443aaf7173155f5
using existing layer sha256:d9735bf21cb7479889ae27f1b34f43a0173fa97286f36c808a9439be88657e83
using existing layer sha256:59eda4b87a1b3455735f4c59d45d86eb71556568f0b3d748c92bff9a7720e3d7
using existing layer sha256:b742e5414ad161e36e4731e5dfd125733810cc6a8d9f58a343f663a42612533b
writing manifest
success
$ ollama run qwenhi
>>> 你是谁
我是谢先斌研发的人工智能助手,你可以称呼我为x。我主要擅长中文交流以及讲笑话。有什么问题或者需要帮助的吗?
>>> /bye
...
ollama show qwenhi --modelfile
使用
Modelfile
基础
创建马里奥 blueprint 的 Modelfile
示例:
FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096
# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.
使用方法:
- 将其保存为文件(如
Modelfile
)。
ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>
ollama run choose-a-model-name
- 开始使用模型!
要查看指定模型(model)的 Modelfile,请使用 ollama show --modelfile
命令。
ollama show --modelfile llama3.2
输出:
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
指令
FROM (Required)
FROM
指令定义了创建模型时要使用的基础模型
FROM <model name>:<tag>
基于现有模式构建
FROM llama3.2
- 可用的基础模型:https://github.com/ollama/ollama#model-library
- 更多模型可在以下网站找到:https://ollama.com/library
基于 Safetensors 模型构建
FROM <model directory>
Currently supported model architectures:
模型目录应包含受支持架构的 Safetensors 权重,目前支持的模型架构:
- Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2)
- Mistral (including Mistral 1, Mistral 2, and Mixtral)
- Gemma (including Gemma 1 and Gemma 2)
- Phi3
基于 GGUF 文件构架
FROM ./ollama-model.gguf
GGUF 文件位置应指定为绝对路径或相对于 Modelfile
的位置
参数 PARAMETER
PARAMETER
指令定义了模型运行时可以设置的参数。
PARAMETER <parameter> <parametervalue>
有效参数和值
Parameter |
Description |
Value Type |
Example Usage |
mirostat |
Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) |
int |
mirostat 0 |
mirostat_eta |
Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) |
float |
mirostat_eta 0.1 |
mirostat_tau |
Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) |
float |
mirostat_tau 5.0 |
num_ctx |
Sets the size of the context window used to generate the next token. (Default: 2048) |
int |
num_ctx 4096 |
repeat_last_n |
Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) |
int |
repeat_last_n 64 |
repeat_penalty |
Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) |
float |
repeat_penalty 1.1 |
temperature |
The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) |
float |
temperature 0.7 |
seed |
Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) |
int |
seed 42 |
stop |
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. |
string |
stop “AI assistant:” |
num_predict |
Maximum number of tokens to predict when generating text. (Default: -1, infinite generation) |
int |
num_predict 42 |
top_k |
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) |
int |
top_k 40 |
top_p |
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) |
float |
top_p 0.9 |
min_p |
Alternative to the topp, and aims to ensure a balance of quality and variety. The parameter _p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0) |
float |
min_p 0.05 |
参数说明:
temperature
temperature 模型的温度,增加温度会使模型更具创造性
num_ctx
设置用于生成下一个 Token 的上下文窗口大小,默认值: 2048
top_k
减少生成无意义文本的概率,较高的值(例如 100)会给出更多样化的答案,而较低的值(例如 10)会更保守,默认值: 40
- 整数值,通常设置在 0 到 100 之间
- 较低的
top_k
值降低了 LLM 生成无意义内容的概率
top_p
与 top_k
一起使用,较高的值(例如 0.95)会导致更多样化的文本,而较低的值(例如 0.5)会生成更聚焦和保守的文本,默认值: 0.9
- 参数是一个介于 0 和 1 之间的浮点值
- 较高的值,即 1.0 意味着 LLM 被允许考虑更广泛的可能下一个 token 范围,从而允许更多的创造力
seed
设置用于生成的随机数种子,将其设置为特定数字将使模型对相同提示生成相同的文本
num_predict
生成文本时要预测的最大 Token 数
模板 TEMPLATE
TEMPLATE
是将完整提示模板(prompt template)
传递给模型。它可能包括(可选)系统信息(system message)
、用户信息(user's message)
和模型的响应。注意:语法可能与模型有关。模板使用 Go template syntax。
模板变量 Template Variables
Variable |
Description |
{{ .System }} |
The system message used to specify custom behavior. |
{{ .Prompt }} |
The user prompt message. |
{{ .Response }} |
The response from the model. When generating a response, text after this variable is omitted. |
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
系统 SYSTEM
SYSTEM
指令指定模板中要使用的系统信息。
SYSTEM """<system message>"""
适配器 ADAPTER
ADAPTER
指令指定了应用于基本模型的微调 LoRA 适配器(LoRA adapter)
。适配器的值应该是绝对路径或相对于 Modelfile 的路径。应使用 FROM
指令指定基础模型。如果基础模型与根据适配器调整的基础模型不相同,则行为将不稳定。
Safetensor adapter
ADAPTER <path to safetensor adapter>
目前支持的 Safetensor 适配器:
- Llama (including Llama 2, Llama 3, and Llama 3.1)
- Mistral (including Mistral 1, Mistral 2, and Mixtral)
- Gemma (including Gemma 1 and Gemma 2)
GGUF adapter
ADAPTER ./ollama-lora.gguf
LICENSE
通过 LICENSE
指令,可以指定共享或分发与此 Modelfile 一起使用的模型所依据的法律许可。
LICENSE """
<license text>
"""
MESSAGE
通过 MESSAGE
指令,可以指定信息历史记录,供模型在回复时使用。使用 MESSAGE 指令的多次重复可以建立一个对话,引导模型以类似的方式进行回复。
MESSAGE <role> <message>
Role |
Description |
system |
Alternate way of providing the SYSTEM message for the model. |
user |
An example message of what the user could have asked. |
assistant |
An example message of how the model should respond. |
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes
说明
-
the Modelfile
is not case sensitive. In the examples, uppercase instructions are used to make it easier to distinguish it from arguments.
-
Instructions can be in any order. In the examples, the FROM
instruction is first to keep it easily readable.
-
Modelfile
不区分大小写。在示例中,使用大写指令是为了更容易将其与参数区分开来。
-
指令的顺序可以任意。在示例中,FROM
指令放在第一位,以便于阅读。