Using APIs from AI Services

⚠️

In any case, you should check the official document, this note may be outdated in the future.

This note is only for working with APIs from AI Services. If you use the AI services as their tools and chat apps, check this note instead.

Remarks (for all services)

Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.

Only OpenAI has user and n parameters.

Temperature

ㅤ	OpenAI	Mistral	Claude	Gemini
Range	[0, 2]	[0, 1]	[0, 1]	[0, 2] (`gemini-1.5-pro`) [0, 1] (`gemini-1.0-pro-vision`) [0, 2] (`gemini-1.0-pro-002`) [0, 1] (`gemini-1.0-pro-001`)
Default	1	0.7	1	1 (`gemini-1.5-pro`) 0.4 (`gemini-1.0-pro-vision`) 1 (`gemini-1.0-pro-002`) 0.9 (`gemini-1.0-pro-001`)

Max tokens

No services give a very clear discription about the max tokens. You have to read the documentation carefully.

Claude gives the max output to 4096 tokens to all models.

Mistral (they write 32k, 8k,…)

OpenAI sometimes give 4096 tokens for the output.

Gemini

Vercel’s AI SDK

Official documentation.

AI SDK for Azure provider, it’s a little special.

ModelMessage

OpenAI

Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.

Prompt caching (auto enabled) → best practices

Response API

Official doc.

AzureOpenAI

There are some remarks (tried with version 1.40.8 and below):

It doesn’t accept null (when you use REST API) or None (when using Python SDK) with some specific endpoints (whereas some don’t have this kind of issue). I tried with stream_options. However, the class OpenAI (also from that package) allows that!

Even if we add stream_options=True in the request, there is no usage return as in OpenAI class!

Note that, the changes in Azure OpenAI often come after the official OpenAI’s changes. For example, property max_completion_tokens replaces max_tokens but it isn’t supported yet in Azure OpenAI.

Although OpenAI says that max_completion_tokens is a new way of the dedicated max_tokens but they aren’t the same in my tests. For example, if we set max_tokens=200 for model gpt-4o and max_completion_tokens=200 for model o1-mini. The latter will stops without content and finish_reson is “length” (it seems it’s too short to show) when I ask “hello”. The former is good to show the answer.

Response API

Official doc

v1 format

⚠️ Be careful: previous messages can be input to the input property, but it differs from the messages property in older APIs. For example, tools in the message list must be handled differently. Check more here.

Response format + no stream

Response format + stream (doc)

Mistral

Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.

Remark: if you use rest api, i’ts tool_calls in the returned object where it’s toolCalls if you use sdk. The same for others like finish_reason or prompt_tokens.

Differences with streaming enabled between sdk and rest api:

Be carefule, if you use Mistral REST API, the returned property is tool_calls but if you use Mistral SDK API (@mistralai/mistralai), the returned property is toolCalls instead. If you use tool_calls in SDK, there will be an error!

Claude

Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.

max_tokens is required!

It has a very different format in comparison with OpenAI and Mistral (both in input and output).

Claude doesn’t have an endpoint for getting the list of models like OpenAI/Mistral.

If you use REST API, you have to indicate version of the API, if you use SDK, you don’t have to.

First message must use the “user” role. OpenAI/Mistral don’t restrict this rule.

There are 2 ways to provide the text

Gemini

If you use Python SDK, python v3.9 is required!

Get the API key in Google AI Studio.

Sau khi có API key rồi thì check Quickstart.

List of value for safety configs.

System instruction (prompt)

Different from other services, the role in the content message is either “user” or “model” (not “assistant”)

Different from other services, Gemini doesn’t allow the name of a tool starting with special characters like -.

“system_instruction” isn’t enabled for model gemini-1.0-pro, if you use it, there will be an error Developer instruction is not enabled for models/gemini-1.0-pro". In this case, you can use "role": "model" for the instruction.

Python SDK example doc: generation_config

When using embeddings, note that the model name in the API request must start with models/ or tunedModels/. For instance, while the documentation refers to text-embedding-004, you must use models/text-embedding-004 in your actual request.

models/text-embedding-004 has a fixed output dimensionality of 768 - this is hardcoded into the model's architecture and cannot be changed. The output_dimensionality parameter in your config is simply ignored by this model.

Điều này có thể dẫn đến lỗi nếu như column chỉ chấp nhận 1 chiều nào đó ko khớp. Cách check dimension:

xAI (Grok)

xai-cookbook repo

Streaming response with REST API

Most services provide their own SDK to get the streaming response. These SDKs work out of the box. However, there are cases where we cannot use the SDKs (for example, I encountered the error ERR_REQUIRE_ESM when working with the Mistral SDK). In such cases, it's better to work directly with their REST APIs.

Using `fetch`

Check this code.

Using `axios`

Check this code.

Python

⚠️

Remark: The chunk depends heavily and “strangely” on the chunk_size. Even if we set chunk_size=None to get the full chunk, the return chunk is shown below

If we count the data : found in each chunk, it will be 3, 1, 1, …, 1, 2. I still don’t know why!

Troubleshooting

1# response model (no stream) 2from openai import OpenAI 3 4client = OpenAI( 5 api_key=azure_api_key, 6 # full: https://your_resource_name.openai.azure.com/openai/v1/responses 7 base_url="https://your_resource_name.openai.azure.com/openai/v1" 8) 9 10response = client.responses.create( 11 model="gpt-4.1", # Replace with your model deployment name 12 // input="This is a test." 13 "input": [ 14 { 15 "role": "system", 16 "content": "Your name is XXXXThi." 17 }, 18 { 19 "role": "user", 20 "content": "Just say \"Hello Thi\" and your name. No more." 21 } 22 ], 23) 24 25print(f"text: {response.output[0].content[0].text}") 26print(f"input_tokens: {response.usage.input_tokens}") 27print(f"output_tokens: {response.usage.output_tokens}") 28# print(response.model_dump_json(indent=2))

1from openai import OpenAI 2 3client = OpenAI( 4 api_key=azure_api_key, 5 # full: https://your_resource_name.openai.azure.com/openai/v1/responses 6 base_url="https://your_resource_name.openai.azure.com/openai/v1" 7) 8 9response = client.responses.create( 10 model="gpt-4.1", # Replace with your model deployment name 11 input="This is a test.", 12 stream=True 13) 14 15for event in response: 16 # print(f"event: {event}") 17 if event.type == 'response.output_text.delta': 18 print(f"event.delta: {event.delta}") 19 if event.type == 'response.completed': 20 print(event.response.usage)

1"messages": [ 2 { 3 "role": "user", 4 "content": "some text" 5 }, 6 { 7 "type": "tool_result", 8 "tool_use_id": "toolu_01Q4E1SvgCuAB4KvHMsHFVFD", 9 "content": "Based on your information, an username is automatically generated as dieu_dinh_vietnam" 10 }, 11] 12 13// or 14"messages": [ 15 { 16 "role": "user", 17 "content": [ { "type": "text", "text": "some text" } ] 18 }, 19 { 20 "type": "tool_result", 21 "tool_use_id": "toolu_01DSntXabEn5gNmnnPCYDvsa", 22 "content": [ 23 { 24 "type": "text", 25 "text": "Your password is the last 3 digits of your phone number: 678" 26 } 27 ] 28 } 29]

1{ 2 "contents": [ 3 { 4 "role": "model", 5 "parts": { "text": "You are a very helpful assistant!" } 6 }, 7 { 8 "role": "user", 9 "parts": { "text": "Hello, who are you?" } 10 } 11 ] 12}

1request_body = { 2 'model': 'casperhansen/llama-3-70b-instruct-awq', 3 'messages': [ 4 {'role': 'user', 'content': 'count from 1 to 2, separated by commas'} 5 ], 6 'stream': True, 7 'stream_options': {"include_usage": True} 8} 9 10headers = { 11 'Authorization': 'Bearer ' + api_key, 12 'Content-Type': 'application/json' 13} 14 15stream = True 16with requests.post(url, data=json.dumps(request_body), stream=stream, headers=headers) as response: 17 if response.status_code == 200: 18 for chunk in response.iter_content(chunk_size=1024): 19 if chunk and chunk.startswith(b'data: '): 20 print(chunk.decode('utf-8')) 21 else: 22 print(f"Failed to connect: {response.status_code}")

Using APIs from AI Services

Using APIs from AI Services

Remarks (for all services)

Temperature

Max tokens

Vercel’s AI SDK

OpenAI

Response API

AzureOpenAI

Response API

v1 format

Mistral

Claude

Gemini

xAI (Grok)

Streaming response with REST API

Using fetch

Using axios

Python

Troubleshooting

Using APIs from AI Services

Using APIs from AI Services

Remarks (for all services)

Temperature

Max tokens

Vercel’s AI SDK

OpenAI

Response API

AzureOpenAI

Response API

v1 format

Mistral

Claude

Gemini

xAI (Grok)

Streaming response with REST API

Using fetch

Using axios

Python

Troubleshooting

Using `fetch`

Using `axios`

Using `fetch`

Using `axios`