In any case, you should check the official document, this note may be outdated in the future.
🐙 Github repo: any-api-playground
- Cursor Directory/MCPs
- Install MCP for Claude Desktop: Settings → Developer → Edit Config
- Install MCP for Cursor:
- Global: Cursor → Settings… → Cursor Settings → Tools & Integrations
- Project:
.cursor/mcp.json
- In your IDE (VSCode or Cursor), install the Continue extension.
- In LM Studio, navigate to the Developer tab, select your downloaded model → Settings → enable "Serve on Local Network" → enable the server.
- In your IDE, select the "Continue" tab on the left sidebar → Choose "Or, configure your own model" → "Click here to view more providers" (or select the Ollama icon tab if you're using Ollama) → in the provider list, select LM Studio → Set Model to "Autodetect" → Connect → a config file will open at
~/.continue/config.yaml
, keep the default settings and save.
- That's it!
- As another option, you can use Granite.code (from IBM)
- Note: VSCode.
- Different from VSCode, all
cmd+k
is replaced bycmd+r
!
I’m using Claude Code, if you use another Coding CLI service, modify the codes. Insert below codes in
.bashrc
or .zshrc
and then source ~/.zshrc
:- ❤️ Alternative to (and compatible with) Claude Code: Z.ai's GLM Coding, which is much cheaper.
- Instruction prompt: Add instructions to
~/.claude/
CLAUDE.md
for global guides (for any project). For a single project, addCLAUDE.md
in the root of the project!
- You can copy texts and even screenshots to clipboard and then “paste” to the current Claude code command.
Or usingPuppeteerto automate feedback (no need to take screenshots by yourself).← Using chrome-devtools-mcp or Playwright MCP server- You can drag and drop images directly into the prompt input.
- When coding CLI with Claude, if you want to add something to
CLAUDE.md
, just add#
before what you want to say, e.g.# Response in Vietnamese every question
.
- Tab to autocomplete filenames.
- ESC to interrupt a running processing.
- Context mangement: look at the phrase “Context left until auto-compact: 16%” at the bottom-right of the command input.
claude --dangerously-skip-permissions
= bypass all permission checks and let Claude work uninterrupted until completion
- Using
/clear
to reset the context window.
Using\
and thenenter
to write multiple lines in the input. ←/terminal-setup
to tell Claude install the terminal Shift+Enter key binding.
shift+tab
to toggle between plan mode (ask for confirmation) and auto-accept mode (no confirmation)
- Check the official guide but it’s not enough.
- Modify your
~/.zshrc
or~/.bashrc
Then
source ~/.zshrc
or source ~/.bashrc
in the current terminal to make them work!- To switch between services, run
vb_glm
to use GLM orvb_claude
to use default Claude Code.
- Verify the configuration by typing
claude
in the terminal, then running/status
. - You can also simply ask "Who r u? Which model r u?"
- To make GLM work with the latest VSCode extension: open the terminal, switch to GLM with
vb_glm
, then open the current folder usingcursor .
orcode .
. - Test the Claude Code extension by asking: "Who r u? Which model r u?" (you may need to ask several times until you see an answer containing "glm-4.6")
- ⭐ Another way: Open IDE Settings → search for "Claude Code" → Click to open the
settings.json
file and add the following:
Then reload the current IDE windows.
⚠️ Note that "default" for "selectedModel" will not work! You can also type
/model
and then select "opus".- Benefit: single source-of-truth for all.
- Create
ai-instructions.md
in the root (or somewhere else).
- For Claude Code: in
.claude/settings.json
(not.claude/settings.local.json
)
- For Cursor: in
.cursor/rules/general.mdc
- For Github Copilot: in
.github/copilot-instructions.md
In
.vscode/settings.json
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
- Only OpenAI has
user
andn
parameters.
No services give a very clear discription about the max tokens. You have to read the documentation carefully.
- Claude gives the max output to 4096 tokens to all models.
- Mistral (they write 32k, 8k,…)
- OpenAI sometimes give 4096 tokens for the output.
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
- Prompt caching (auto enabled) → best practices
There are some remarks (tried with version
1.40.8
and below):- It doesn’t accept
null
(when you use REST API) orNone
(when using Python SDK) with some specific endpoints (whereas some don’t have this kind of issue). I tried withstream_options
. However, the classOpenAI
(also from that package) allows that!
- Even if we add
stream_options=True
in the request, there is no usage return as inOpenAI
class!
- Note that, the changes in Azure OpenAI often come after the official OpenAI’s changes. For example, property
max_completion_tokens
replacesmax_tokens
but it isn’t supported yet in Azure OpenAI.
- Although OpenAI says that
max_completion_tokens
is a new way of the dedicatedmax_tokens
but they aren’t the same in my tests. For example, if we setmax_tokens=200
for modelgpt-4o
andmax_completion_tokens=200
for modelo1-mini
. The latter will stops without content and finish_reson is “length” (it seems it’s too short to show) when I ask “hello”. The former is good to show the answer.
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
- Remark: if you use rest api, i’ts
tool_calls
in the returned object where it’stoolCalls
if you use sdk. The same for others likefinish_reason
orprompt_tokens
.
- Differences with streaming enabled between sdk and rest api:
- Be carefule, if you use Mistral REST API, the returned property is
tool_calls
but if you use Mistral SDK API (@mistralai/mistralai
), the returned property istoolCalls
instead. If you usetool_calls
in SDK, there will be an error!
- Mistral and Claude don’t accept
null
in the request body. OpenAI allows that (not withAzureOpenAI
, I tried withstream_options
). In OpenAI request, if you inputnull
, the default value for that property will be used. However, there will be an error if you putnull
in the request body of Mistral.
max_tokens
is required!
- It has a very different format in comparison with OpenAI and Mistral (both in input and output).
- Claude doesn’t have an endpoint for getting the list of models like OpenAI/Mistral.
- If you use REST API, you have to indicate version of the API, if you use SDK, you don’t have to.
- First message must use the “user” role. OpenAI/Mistral don’t restrict this rule.
- There are 2 ways to provide the text
- If you use Python SDK, python v3.9 is required!
- Get the API key in Google AI Studio.
- Sau khi có API key rồi thì check Quickstart.
- List of value for safety configs.
- System instruction (prompt)
- Different from other services, the role in the content message is either “user” or “model” (not “assistant”)
- Different from other services, Gemini doesn’t allow the name of a tool starting with special characters like
-
.
- “system_instruction” isn’t enabled for model
gemini-1.0-pro
, if you use it, there will be an errorDeveloper instruction is not enabled for models/gemini-1.0-pro"
. In this case, you can use"role": "model"
for the instruction.
- Python SDK example doc:
generation_config
- When using embeddings, note that the model name in the API request must start with
models/
ortunedModels/
. For instance, while the documentation refers totext-embedding-004
, you must usemodels/text-embedding-004
in your actual request.
models/text-embedding-004
has a fixed output dimensionality of 768 - this is hardcoded into the model's architecture and cannot be changed. Theoutput_dimensionality
parameter in your config is simply ignored by this model.
Điều này có thể dẫn đến lỗi nếu như column chỉ chấp nhận 1 chiều nào đó ko khớp. Cách check dimension:
Most services provide their own SDK to get the streaming response. These SDKs work out of the box. However, there are cases where we cannot use the SDKs (for example, I encountered the error
ERR_REQUIRE_ESM
when working with the Mistral SDK). In such cases, it's better to work directly with their REST APIs.Check this code.
Check this code.
Remark: The chunk depends heavily and “strangely” on the
chunk_size
. Even if we set chunk_size=None
to get the full chunk, the return chunk is shown belowIf we count the
data :
found in each chunk, it will be 3, 1, 1, …, 1, 2. I still don’t know why!