ollama REST API 使用介绍,ollama 提供 ollama 格式的 API 和 openai 格式的 API,本文介绍 ollama 格式的 API
REST API 介绍
ollama serve
时,启动 REST API 默认监听在 11434
,兼容 OpenAI API 格式
生成响应 Generate a response
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt":"Why is the sky blue?"
}'
与模型交流 Chat with a model
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Ollama 支持的交互方式
- 命令行
ollama run ollama run qwen:7b
python 封装:
import subprocess
cmd = ["ollama", "run", "ollama run qwen:7b"]
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
- ollama-python 库交互
# pip3 install ollama
import ollama
response = ollama.chat(model='qwen:7b', stream=False,
messages=[{'role': 'user', 'content': 'who are you?'}]
)
print(response['message']['content'])
# I am a large language model created by Alibaba Cloud. I go by the name Qwen.
- openai 接口
Ollama 提供与 OpenAI API 功能的兼容性 API
(1). OpenAI Python库from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
# 必需但被忽略
api_key='ollama',
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama2',
)
约定
模型名
模型名称遵循 model:tag
格式,其中 model
可以有一个可选的命名空间,如 example/model
。例如 orca-mini:3b-q4_1
和 llama3:70b
。标签是可选的,如果不提供,则默认为 latest
。标签用于标识特定版本。
Durations
所有持续时间(Durations)都以纳秒为单位。
流式响应 Streaming responses
某些 API(endpoints)会将响应作为 JSON 对象进行流式处理。可以通过为这些端点提供 {"stream": false}
来禁用流。
Generate a completion
POST /api/generate
使用提供的模型为给定提示生成响应。这是一个流式 API(endpoints),因此会有一系列响应。最终的响应对象将包括来自请求的统计数据和附加数据。
参数 Parameters
model
: (required) 模型名
prompt
: 生产内容的 prompt
suffix
: the text after the model response
images
: (optional) a list of base64-encoded images (for multimodal models such as llava
)
高级可选参数:
format
: 返回响应的格式。格式可以是 json
格式
options
: 模型文件文档中列出的其他模型参数,如 temperature
system
: system message (覆盖在 Modelfile
中的定义)
template
: 要使用的提示模板 (覆盖在 Modelfile
中的定义)
stream
: 如果为 false
,响应将以单个响应对象的形式返回,而不是以对象流的形式返回
raw
: 如果为 true
,则不会对提示语应用格式化。如果您在向应用程序接口发出的请求中指定了一个完整的提示模板,您可以选择使用 raw
参数
keep_alive
: 制模型在请求后加载到内存的时间 (default: 5m
)
context
(废弃): 上一次请求/generate
时返回的上下文参数,可用于保留简短的对话记忆
结构化输出
通过在 format
参数中提供 JSON 模式,可支持结构化输出。模型将生成与模式匹配的响应。
JSON mode
通过将 format
参数设置为 json
,启用 JSON 模式
。这将使响应结构成为有效的 JSON 对象。
在 prompt
中指示模型使用 JSON 很重要。否则,模型可能会产生大量空白。
示例
Generate request (Streaming)
Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
Response
返回一个 JSON 对象流:
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
数据流中的最终响应还包括有关生成的其他数据:
total_duration
: 生成响应所用的时间(纳秒)
load_duration
: 加载模型所用的时间(纳秒)
prompt_eval_count
: prompt 中的 tokens 数
prompt_eval_duration
: 评估 prompt 所花费的时间(纳秒)
eval_count
: 响应中的 tokens 数
eval_duration
: 生成响应所用的时间(纳秒)
context
: 此响应中使用的对话编码,可在下一次请求中发送,以保留对话记忆
response
: 如果响应是流式的,则为空;如果不是流式的,则包含完整的响应
要计算每秒生成 token(token/s)的响应速度,请将eval_count
/ eval_duration
* 10^9
.
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
"context": [1, 2, 3],
"total_duration": 10706818083,
"load_duration": 6338219291,
"prompt_eval_count": 26,
"prompt_eval_duration": 130079000,
"eval_count": 259,
"eval_duration": 4232710000
}
Request (No streaming)
Request
No streaming 时,一次回复即可收到一个响应。
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
Response
如果将 stream
设置为 false
,响应将是一个单独的 JSON 对象:
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 5043500667,
"load_duration": 5025959,
"prompt_eval_count": 26,
"prompt_eval_duration": 325953000,
"eval_count": 290,
"eval_duration": 4709213000
}
Request (with suffix)
Request
curl http://localhost:11434/api/generate -d '{
"model": "codellama:code",
"prompt": "def compute_gcd(a, b):",
"suffix": " return result",
"options": {
"temperature": 0
},
"stream": false
}'
Response
{
"model": "codellama:code",
"created_at": "2024-07-22T20:47:51.147561Z",
"response": "\n if a == 0:\n return b\n else:\n return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n result = (a * b) / compute_gcd(a, b)\n",
"done": true,
"done_reason": "stop",
"context": [...],
"total_duration": 1162761250,
"load_duration": 6683708,
"prompt_eval_count": 17,
"prompt_eval_duration": 201222000,
"eval_count": 63,
"eval_duration": 953997000
}
Request (结构化输出)
Request
curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
"model": "llama3.1:8b",
"prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
}
}'
Response
{
"model": "llama3.1:8b",
"created_at": "2024-12-06T00:48:09.983619Z",
"response": "{\n \"age\": 22,\n \"available\": true\n}",
"done": true,
"done_reason": "stop",
"context": [1, 2, 3],
"total_duration": 1075509083,
"load_duration": 567678166,
"prompt_eval_count": 28,
"prompt_eval_duration": 236000000,
"eval_count": 16,
"eval_duration": 269000000
}
Request (JSON mode)
当 format
设置为 json
时,输出将始终是格式良好的 JSON 对象。重要的是,还要指示模型以 JSON 格式响应。
Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
Response
{
"model": "llama3.2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}
response
的值将是一个包含类似 JSON 格式的字符串:
{
"morning": {
"color": "blue"
},
"noon": {
"color": "blue-gray"
},
"afternoon": {
"color": "warm gray"
},
"evening": {
"color": "orange"
}
}
Request (with images)
要向 llava
或 bakllava
等多模态模型提交图像,请提供一个 base64 编码的 images
列表:
Request
curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt":"What is in this picture?",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}'
Response
{
"model": "llava",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": "A happy cartoon character, which is cute and cheerful.",
"done": true,
"context": [1, 2, 3],
"total_duration": 2938432250,
"load_duration": 2559292,
"prompt_eval_count": 1,
"prompt_eval_duration": 2195557000,
"eval_count": 44,
"eval_duration": 736432000
}
Request (Raw Mode)
在某些情况下,您可能希望绕过模板系统,提供完整的提示。在这种情况下,可以使用 raw
参数禁用模板化。另外请注意,原始模式不会返回上下文。
Request
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "[INST] why is the sky blue? [/INST]",
"raw": true,
"stream": false
}'
Request (可重复产出)
为获得可重复的输出结果,请将 seed
设为一个数字:
Request
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Why is the sky blue?",
"options": {
"seed": 123
}
}'
Response
{
"model": "mistral",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
"done": true,
"total_duration": 8493852375,
"load_duration": 6589624375,
"prompt_eval_count": 14,
"prompt_eval_duration": 119039000,
"eval_count": 110,
"eval_duration": 1779061000
}
Generate request (With options)
如果你想在运行时而不是在 Modelfile 中为模型设置自定义选项,可以使用 options
参数。本示例设置了所有可用选项,但你也可以单独设置任何选项,并省略你不想覆盖的选项。
Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"min_p": 0.0,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"num_thread": 8
}
}'
Response
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 4935886791,
"load_duration": 534986708,
"prompt_eval_count": 26,
"prompt_eval_duration": 107345000,
"eval_count": 237,
"eval_duration": 4289432000
}
Load a model
如果提示符为空,模型将被加载到内存中。
Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2"
}'
Response
返回一个 JSON 对象:
{
"model": "llama3.2",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
}
Unload a model
如果提供的提示为空,且 keep_alive
参数设置为 0
,则将从内存中卸载模型。
Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"keep_alive": 0
}'
Response
返回一个 JSON 对象:
{
"model": "llama3.2",
"created_at": "2024-09-12T03:54:03.516566Z",
"response": "",
"done": true,
"done_reason": "unload"
}
Generate a chat completion
POST /api/chat
使用提供的模型生成聊天中的下一条消息。这是一个流式端点(streaming endpoint),因此会有一系列响应。可以使用 "stream":false
禁用流。最终的响应对象将包括来自请求的统计数据和附加数据。
Parameters
model
: (required) 模型名
messages
: 聊天信息,可用于保存聊天记忆
tools
: JSON 格式的工具列表,如果模型支持,可使用这些工具
The message
object has the following fields:
role
: the role of the message, either system
, user
, assistant
, or tool
content
: the content of the message
images
(optional): a list of images to include in the message (for multimodal models such as llava
)
tool_calls
(optional): a list of tools in JSON that the model wants to use
Advanced parameters (optional):
format
: the format to return a response in. Format can be json
or a JSON schema.
options
: additional model parameters listed in the documentation for the Modelfile such as temperature
stream
: if false
the response will be returned as a single response object, rather than a stream of objects
keep_alive
: controls how long the model will stay loaded into memory following the request (default: 5m
)
结构化输出
结构化输出 are supported by providing a JSON schema in the format
parameter. The model will generate a response that matches the schema. See the Chat request (结构化输出) example below.
Examples
Chat Request (Streaming)
Request
Send a chat message with a streaming response.
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
Response
A stream of JSON objects is returned:
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The",
"images": null
},
"done": false
}
Final response:
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}
Chat request (No streaming)
Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"stream": false
}'
Response
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
Chat request (结构化输出)
Request
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
},
"options": {
"temperature": 0
}
}'
Response
{
"model": "llama3.1",
"created_at": "2024-12-06T00:46:58.265747Z",
"message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },
"done_reason": "stop",
"done": true,
"total_duration": 2254970291,
"load_duration": 574751416,
"prompt_eval_count": 34,
"prompt_eval_duration": 1502000000,
"eval_count": 12,
"eval_duration": 175000000
}
Chat request (With History)
Send a chat message with a conversation history. You can use this same approach to start the conversation using multi-shot or chain-of-thought prompting.
Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
Response
A stream of JSON objects is returned:
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The"
},
"done": false
}
Final response:
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
"load_duration": 6396458,
"prompt_eval_count": 61,
"prompt_eval_duration": 398801000,
"eval_count": 468,
"eval_duration": 7701267000
}
Chat request (with images)
Request
Send a chat message with images. The images should be provided as an array, with the individual images encoded in Base64.
curl http://localhost:11434/api/chat -d '{
"model": "llava",
"messages": [
{
"role": "user",
"content": "what is in this image?",
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}
]
}'
Response
{
"model": "llava",
"created_at": "2023-12-13T22:42:50.203334Z",
"message": {
"role": "assistant",
"content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
"images": null
},
"done": true,
"total_duration": 1668506709,
"load_duration": 1986209,
"prompt_eval_count": 26,
"prompt_eval_duration": 359682000,
"eval_count": 83,
"eval_duration": 1303285000
}
Chat request (Reproducible outputs)
Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"options": {
"seed": 101,
"temperature": 0
}
}'
Response
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "What is the weather today in Paris?"
}
],
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for, e.g. San Francisco, CA"
},
"format": {
"type": "string",
"description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "format"]
}
}
}
]
}'
Response
{
"model": "llama3.2",
"created_at": "2024-07-22T20:33:28.123648Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_current_weather",
"arguments": {
"format": "celsius",
"location": "Paris, FR"
}
}
}
]
},
"done_reason": "stop",
"done": true,
"total_duration": 885095291,
"load_duration": 3753500,
"prompt_eval_count": 122,
"prompt_eval_duration": 328493000,
"eval_count": 33,
"eval_duration": 552222000
}
Load a model
If the messages array is empty, the model will be loaded into memory.
Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": []
}'
Response
{
"model": "llama3.2",
"created_at": "2024-09-12T21:17:29.110811Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "load",
"done": true
}
Unload a model
If the messages array is empty and the keep_alive
parameter is set to 0
, a model will be unloaded from memory.
Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [],
"keep_alive": 0
}'
Response
返回一个 JSON 对象:
{
"model": "llama3.2",
"created_at": "2024-09-12T21:33:17.547535Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "unload",
"done": true
}
Create a Model
POST /api/create
Create a model from:
- another model;
- a safetensors directory; or
- a GGUF file.
If you are creating a model from a safetensors directory or from a GGUF file, you must create a blob for each of the files and then use the file name and SHA256 digest associated with each blob in the files
field.
Parameters
model
: name of the model to create
from
: (optional) name of an existing model to create the new model from
files
: (optional) a dictionary of file names to SHA256 digests of blobs to create the model from
adapters
: (optional) a dictionary of file names to SHA256 digests of blobs for LORA adapters
template
: (optional) the prompt template for the model
license
: (optional) a string or list of strings containing the license or licenses for the model
system
: (optional) a string containing the system prompt for the model
parameters
: (optional) a dictionary of parameters for the model (see Modelfile for a list of parameters)
messages
: (optional) a list of message objects used to create a conversation
stream
: (optional) if false
the response will be returned as a single response object, rather than a stream of objects
quantize
(optional): quantize a non-quantized (e.g. float16) model
Quantization types
Type |
Recommended |
q2_K |
|
q3_K_L |
|
q3_K_M |
|
q3_K_S |
|
q4_0 |
|
q4_1 |
|
q4_K_M |
* |
q4_K_S |
|
q5_0 |
|
q5_1 |
|
q5_K_M |
|
q5_K_S |
|
q6_K |
|
q8_0 |
* |
Examples
Create a new model
Create a new model from an existing model.
Request
curl http://localhost:11434/api/create -d '{
"model": "mario",
"from": "llama3.2",
"system": "You are Mario from Super Mario Bros."
}'
Response
A stream of JSON objects is returned:
{"status":"reading model metadata"}
{"status":"creating system layer"}
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status":"writing manifest"}
{"status":"success"}
Quantize a model
Quantize a non-quantized model.
Request
curl http://localhost:11434/api/create -d '{
"model": "llama3.1:quantized",
"from": "llama3.1:8b-instruct-fp16",
"quantize": "q4_K_M"
}'
Response
A stream of JSON objects is returned:
{"status":"quantizing F16 model to Q4_K_M"}
{"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}
{"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}
{"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}
{"status":"writing manifest"}
{"status":"success"}
Create a model from GGUF
Create a model from a GGUF file. The files
parameter should be filled out with the file name and SHA256 digest of the GGUF file you wish to use. Use /api/blobs/:digest to push the GGUF file to the server before calling this API.
Request
curl http://localhost:11434/api/create -d '{
"model": "my-gguf-model",
"files": {
"test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
}
}'
Response
A stream of JSON objects is returned:
{"status":"parsing GGUF"}
{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
{"status":"writing manifest"}
{"status":"success"}
Create a model from a Safetensors directory
The files
parameter should include a dictionary of files for the safetensors model which includes the file names and SHA256 digest of each file. Use /api/blobs/:digest to first push each of the files to the server before calling this API. Files will remain in the cache until the Ollama server is restarted.
Request
curl http://localhost:11434/api/create -d '{
"model": "fred",
"files": {
"config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
"generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
"special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
"tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
"tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
"model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
}
}'
Response
A stream of JSON objects is returned:
{"status":"converting model"}
{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
{"status":"using autodetected template llama3-instruct"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"writing manifest"}
{"status":"success"}
Check if a Blob Exists
Ensures that the file blob (Binary Large Object) used with create a model exists on the server. This checks your Ollama server and not ollama.com.
Query Parameters
digest
: the SHA256 digest of the blob
Examples
Request
curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
Response
Return 200 OK if the blob exists, 404 Not Found if it does not.
Push a Blob
POST /api/blobs/:digest
Push a file to the Ollama server to create a “blob” (Binary Large Object).
Query Parameters
digest
: the expected SHA256 digest of the file
Examples
Request
curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
Response
Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.
List Local Models
GET /api/tags
List models that are available locally.
Examples
Request
curl http://localhost:11434/api/tags
Response
A single JSON object will be returned.
{
"models": [
{
"name": "codellama:13b",
"modified_at": "2023-11-04T14:56:49.277302595-07:00",
"size": 7365960935,
"digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
"details": {
"format": "gguf",
"family": "llama",
"families": null,
"parameter_size": "13B",
"quantization_level": "Q4_0"
}
},
{
"name": "llama3:latest",
"modified_at": "2023-12-07T09:32:18.757212583-08:00",
"size": 3825819519,
"digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
"details": {
"format": "gguf",
"family": "llama",
"families": null,
"parameter_size": "7B",
"quantization_level": "Q4_0"
}
}
]
}
POST /api/show
Show information about a model including details, modelfile, template, parameters, license, system prompt.
Parameters
model
: name of the model to show
verbose
: (optional) if set to true
, returns full data for verbose response fields
Examples
Request
curl http://localhost:11434/api/show -d '{
"model": "llama3.2"
}'
Response
{
"modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
"parameters": "num_keep 24\nstop \"<|start_header_id|>\"\nstop \"<|end_header_id|>\"\nstop \"<|eot_id|>\"",
"template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": ["llama"],
"parameter_size": "8.0B",
"quantization_level": "Q4_0"
},
"model_info": {
"general.architecture": "llama",
"general.file_type": 2,
"general.parameter_count": 8030261248,
"general.quantization_version": 2,
"llama.attention.head_count": 32,
"llama.attention.head_count_kv": 8,
"llama.attention.layer_norm_rms_epsilon": 0.00001,
"llama.block_count": 32,
"llama.context_length": 8192,
"llama.embedding_length": 4096,
"llama.feed_forward_length": 14336,
"llama.rope.dimension_count": 128,
"llama.rope.freq_base": 500000,
"llama.vocab_size": 128256,
"tokenizer.ggml.bos_token_id": 128000,
"tokenizer.ggml.eos_token_id": 128009,
"tokenizer.ggml.merges": [], // populates if `verbose=true`
"tokenizer.ggml.model": "gpt2",
"tokenizer.ggml.pre": "llama-bpe",
"tokenizer.ggml.token_type": [], // populates if `verbose=true`
"tokenizer.ggml.tokens": [] // populates if `verbose=true`
}
}
Copy a Model
POST /api/copy
Copy a model. Creates a model with another name from an existing model.
Examples
Request
curl http://localhost:11434/api/copy -d '{
"source": "llama3.2",
"destination": "llama3-backup"
}'
Response
Returns a 200 OK if successful, or a 404 Not Found if the source model doesn’t exist.
Delete a Model
DELETE /api/delete
Delete a model and its data.
Parameters
model
: model name to delete
Examples
Request
curl -X DELETE http://localhost:11434/api/delete -d '{
"model": "llama3:13b"
}'
Response
Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn’t exist.
Pull a Model
POST /api/pull
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
Parameters
model
: name of the model to pull
insecure
: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
stream
: (optional) if false
the response will be returned as a single response object, rather than a stream of objects
Examples
Request
curl http://localhost:11434/api/pull -d '{
"model": "llama3.2"
}'
Response
If stream
is not specified, or set to true
, a stream of JSON objects is returned:
The first object is the manifest:
{
"status": "pulling manifest"
}
Then there is a series of downloading responses. Until any of the download is completed, the completed
key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208,
"completed": 241970
}
After all the files are downloaded, the final responses are:
{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}
if stream
is set to false, then the response is a single JSON object:
Push a Model
POST /api/push
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.
Parameters
model
: name of the model to push in the form of <namespace>/<model>:<tag>
insecure
: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
stream
: (optional) if false
the response will be returned as a single response object, rather than a stream of objects
Examples
Request
curl http://localhost:11434/api/push -d '{
"model": "mattw/pygmalion:latest"
}'
Response
If stream
is not specified, or set to true
, a stream of JSON objects is returned:
{ "status": "retrieving manifest" }
and then:
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
Then there is a series of uploading responses:
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
Finally, when the upload is complete:
{"status":"pushing manifest"}
{"status":"success"}
If stream
is set to false
, then the response is a single JSON object:
Generate Embeddings
POST /api/embed
Generate embeddings from a model
Parameters
model
: name of model to generate embeddings from
input
: text or list of text to generate embeddings for
Advanced parameters:
truncate
: truncates the end of each input to fit within context length. Returns error if false
and context length is exceeded. Defaults to true
options
: additional model parameters listed in the documentation for the Modelfile such as temperature
keep_alive
: controls how long the model will stay loaded into memory following the request (default: 5m
)
Examples
Request
curl http://localhost:11434/api/embed -d '{
"model": "all-minilm",
"input": "Why is the sky blue?"
}'
Response
{
"model": "all-minilm",
"embeddings": [
[
0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814, 0.008599704, 0.105441414, -0.025878139,
0.12958129, 0.031952348
]
],
"total_duration": 14143917,
"load_duration": 1019500,
"prompt_eval_count": 8
}
curl http://localhost:11434/api/embed -d '{
"model": "all-minilm",
"input": ["Why is the sky blue?", "Why is the grass green?"]
}'
Response
{
"model": "all-minilm",
"embeddings": [
[
0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814, 0.008599704, 0.105441414, -0.025878139,
0.12958129, 0.031952348
],
[
-0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725, 0.017194884, 0.09032035, -0.051705178,
0.09951512, 0.09072481
]
]
}
List Running Models
GET /api/ps
List models that are currently loaded into memory.
Examples
Request
curl http://localhost:11434/api/ps
Response
A single JSON object will be returned.
{
"models": [
{
"name": "mistral:latest",
"model": "mistral:latest",
"size": 5137025024,
"digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": ["llama"],
"parameter_size": "7.2B",
"quantization_level": "Q4_0"
},
"expires_at": "2024-06-04T14:38:31.83753-07:00",
"size_vram": 5137025024
}
]
}
Generate Embedding
Note: this endpoint has been superseded by /api/embed
POST /api/embeddings
Generate embeddings from a model
Parameters
model
: name of model to generate embeddings from
prompt
: text to generate embeddings for
Advanced parameters:
options
: additional model parameters listed in the documentation for the Modelfile such as temperature
keep_alive
: controls how long the model will stay loaded into memory following the request (default: 5m
)
Examples
Request
curl http://localhost:11434/api/embeddings -d '{
"model": "all-minilm",
"prompt": "Here is an article about llamas..."
}'
Response
{
"embedding": [
0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]
}
Version
GET /api/version
Retrieve the Ollama version
Examples
Request
curl http://localhost:11434/api/version
Response
lib