In any case, you should check the official document, this note may be outdated in the future.
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
- Only OpenAI has
user
andn
parameters.
- Stronger models give better results.
No services give a very clear discription about the max tokens. You have to read the documentation carefully.
- Claude gives the max output to 4096 tokens to all models.
- Mistral (they write 32k, 8k,…)
- OpenAI sometimes give 4096 tokens for the output.
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
- Prompt caching (auto enabled) → best practices
There are some remarks (tried with version
1.40.8
and below):- It doesn’t accept
null
(when you use REST API) orNone
(when using Python SDK) with some specific endpoints (whereas some don’t have this kind of issue). I tried withstream_options
. However, the classOpenAI
(also from that package) allows that!
- Even if we add
stream_options=True
in the request, there is no usage return as inOpenAI
class!
- Note that, the changes in Azure OpenAI often come after the official OpenAI’s changes. For example, property
max_completion_tokens
replacesmax_tokens
but it isn’t supported yet in Azure OpenAI.
- Although OpenAI says that
max_completion_tokens
is a new way of the dedicatedmax_tokens
but they aren’t the same in my tests. For example, if we setmax_tokens=200
for modelgpt-4o
andmax_completion_tokens=200
for modelo1-mini
. The latter will stops without content and finish_reson is “length” (it seems it’s too short to show) when I ask “hello”. The former is good to show the answer.
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
- Remark: if you use rest api, i’ts
tool_calls
in the returned object where it’stoolCalls
if you use sdk. The same for others likefinish_reason
orprompt_tokens
.
- Differences with streaming enabled between sdk and rest api:
1// rest api
2chunk?.choices?.[0]?.delta?.content
3
4// sdk
5chunk?.data?.choices?.[0]?.delta?.content
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
max_tokens
is required!
- It has a very different format in comparison with OpenAI and Mistral (both in input and output).
- Claude doesn’t have an endpoint for getting the list of models like OpenAI/Mistral.
- If you use REST API, you have to indicate version of the API, if you use SDK, you don’t have to.
- First message must use the “user” role. OpenAI/Mistral don’t restrict this rule.
- If you use Python SDK, python v3.9 is required!
- Get the API key in Google AI Studio.
- Sau khi có API key rồi thì check Quickstart.
- List of value for safety configs.
- System instruction (prompt)
- Different from other services, the role in the content message is either “user” or “model” (not “assistant”)
- Different from other services, Gemini doesn’t allow the name of a tool starting with special characters like
-
.
- “system_instruction” isn’t enabled for model
gemini-1.0-pro
, if you use it, there will be an errorDeveloper instruction is not enabled for models/gemini-1.0-pro"
. In this case, you can use"role": "model"
for the instruction.
1{
2 "contents": [
3 {
4 "role": "model",
5 "parts": { "text": "You are a very helpful assistant!" }
6 },
7 {
8 "role": "user",
9 "parts": { "text": "Hello, who are you?" }
10 }
11 ]
12}
- Python SDK example doc:
generation_config
- When using embeddings, note that the model name in the API request must start with
models/
ortunedModels/
. For instance, while the documentation refers totext-embedding-004
, you must usemodels/text-embedding-004
in your actual request.
Most services provide their own SDK to get the streaming response. These SDKs work out of the box. However, there are cases where we cannot use the SDKs (for example, I encountered the error
ERR_REQUIRE_ESM
when working with the Mistral SDK). In such cases, it's better to work directly with their REST APIs.Check this code.
Check this code.
1request_body = {
2 'model': 'casperhansen/llama-3-70b-instruct-awq',
3 'messages': [
4 {'role': 'user', 'content': 'count from 1 to 2, separated by commas'}
5 ],
6 'stream': True,
7 'stream_options': {"include_usage": True}
8}
9
10headers = {
11 'Authorization': 'Bearer ' + api_key,
12 'Content-Type': 'application/json'
13}
14
15stream = True
16with requests.post(url, data=json.dumps(request_body), stream=stream, headers=headers) as response:
17 if response.status_code == 200:
18 for chunk in response.iter_content(chunk_size=1024):
19 if chunk and chunk.startswith(b'data: '):
20 print(chunk.decode('utf-8'))
21 else:
22 print(f"Failed to connect: {response.status_code}")
Remark: The chunk depends heavily and “strangely” on the
chunk_size
. Even if we set chunk_size=None
to get the full chunk, the return chunk is shown below1index: 0
2data: xxxxx
3data: xxxxxx
4data: xxxx
5
6index: 1
7data: xxxx
8
9index: 2
10data: xxxx
11data: [DONE]
I don't understand (yet) why the chunk in the for loop returned like that. That's why we cannot handle it normally by parsing the JSON format for each chunk. Instead, one idea is to use
.search
with regex to get the format and value we want.If we count the
data :
found in each chunk, it will be 3, 1, 1, …, 1, 2. I still don’t know why!