In any case, you should check the official document, this note may be outdated in the future.
🐙 Github repo: any-api-playground
Note: How to use LLM?
- Cursor Directory/MCPs
- Install MCP for Claude Desktop: Settings → Developer → Edit Config
- Install MCP for Cursor:
- Global: Cursor → Settings… → Cursor Settings → Tools & Integrations
- Project:
.cursor/mcp.json
- In your IDE (VSCode or Cursor), install the Continue extension.
- In LM Studio, navigate to the Developer tab, select your downloaded model → Settings → enable "Serve on Local Network" → enable the server.
- In your IDE, select the "Continue" tab on the left sidebar → Choose "Or, configure your own model" → "Click here to view more providers" (or select the Ollama icon tab if you're using Ollama) → in the provider list, select LM Studio → Set Model to "Autodetect" → Connect → a config file will open at
~/.continue/config.yaml, keep the default settings and save.
- That's it!
- As another option, you can use Granite.code (from IBM)
- Note: VSCode.
- Different from VSCode, all
cmd+kis replaced bycmd+r!
- If you prefer a vertical activity bar like VSCode’s (for search, extensions, and other icons) instead of the horizontal layout, navigate to Settings → Workbench → Activity Bar → Orientation and change it there.
I’m using Claude Code, if you use another Coding CLI service, modify the codes. Insert below codes in
.bashrc or .zshrc and then source ~/.zshrc:- Working with IDE:
- Set a keyboard shortcut for "Claude Code: Open in Side Bar" (I use
ctrl+ESC) to quickly open Claude Code in the Side Bar. - When Claude Code is open in the Sidebar, an icon will appear. Drag and drop this icon to the other sidebar (for example, in the same area as Cursor Chat or GitHub Chat).
- 🎉 Once configured, Claude Code will automatically open in the chat panel whenever you access it from the sidebar.
- ❤️ Alternative to (and compatible with) Claude Code: Z.ai's GLM Coding, which is much cheaper.
- Instruction prompt: Add instructions to
~/.claude/CLAUDE.mdfor global guides (for any project). For a single project, addCLAUDE.mdin the root of the project!
- You can copy texts and even screenshots to clipboard and then “paste” to the current Claude code command.
Or usingPuppeteerto automate feedback (no need to take screenshots by yourself).← Using chrome-devtools-mcp or Playwright MCP server- You can drag and drop images directly into the prompt input.
- When coding CLI with Claude, if you want to add something to
CLAUDE.md, just add#before what you want to say, e.g.# Response in Vietnamese every question.
- Tab to autocomplete filenames.
- ESC to interrupt a running processing.
- Context mangement: look at the phrase “Context left until auto-compact: 16%” at the bottom-right of the command input.
claude --dangerously-skip-permissions= bypass all permission checks and let Claude work uninterrupted until completion
- Using
/clearto reset the context window.
Using\and thenenterto write multiple lines in the input. ←/terminal-setupto tell Claude install the terminal Shift+Enter key binding.
shift+tabto toggle between plan mode (ask for confirmation) and auto-accept mode (no confirmation)
Best practice: Always use Claude Code login in the VSCode extension (it will be reset after each 5 hours). Always use Claude Code CLI with GLM, open a new tab as a Terminal (next to the tab of extension). With that way, whenever, the usage runs out, we can change to use GLM in the Terminal tab easily.
- Check the official guide but it’s not enough.
- Modify your
~/.zshrcor~/.bashrc
Then
source ~/.zshrc or source ~/.bashrc in the current terminal to make them work!- To switch between services, run
vb_glmto use GLM orvb_claudeto use default Claude Code.
- Verify the configuration by typing
claudein the terminal, then running/status. - You can also simply ask "Who r u? Which model r u?"
- To make GLM work with the latest VSCode extension: open the terminal, switch to GLM with
vb_glm, then open the current folder usingcursor .orcode .. - Test the Claude Code extension by asking: "Who r u? Which model r u?" (you may need to ask several times until you see an answer containing "glm-4.6")
- ⭐ Another way: Open IDE Settings → search for "Claude Code" → Click to open the
settings.jsonfile and add the following:
Then reload the current IDE windows.
⚠️ Note that "default" for "selectedModel" will not work! You can also type
/model and then select "opus".This method works with both Claude Code CLI and the latest Claude Code extension in IDE.
- Create or update
~/.claude/settings.jsonwith the following hook:
- Create a script
~/.claude/scripts/notify-end.shwith following content:
- Then run
chmod +x ~/.claude/scripts/notify-end.shp
- Restart your Claude Code (both CLI or extension) to see the result!
- Benefit: single source-of-truth for all.
- Create
ai-instructions.mdin the root (or somewhere else).
- For Claude Code: in
.claude/settings.json(not.claude/settings.local.json)
- For Cursor: in
.cursor/rules/general.mdc
- For Github Copilot: in
.github/copilot-instructions.md
In
.vscode/settings.json- Mistral and Claude don’t accept
nullin the request body. OpenAI allows that (not withAzureOpenAI, I tried withstream_options). In OpenAI request, if you inputnull, the default value for that property will be used. However, there will be an error if you putnullin the request body of Mistral.
- Only OpenAI has
userandnparameters.
No services give a very clear discription about the max tokens. You have to read the documentation carefully.
- Claude gives the max output to 4096 tokens to all models.
- Mistral (they write 32k, 8k,…)
- OpenAI sometimes give 4096 tokens for the output.
- Mistral and Claude don’t accept
nullin the request body. OpenAI allows that (not withAzureOpenAI, I tried withstream_options). In OpenAI request, if you inputnull, the default value for that property will be used. However, there will be an error if you putnullin the request body of Mistral.
- Prompt caching (auto enabled) → best practices
There are some remarks (tried with version
1.40.8 and below):- It doesn’t accept
null(when you use REST API) orNone(when using Python SDK) with some specific endpoints (whereas some don’t have this kind of issue). I tried withstream_options. However, the classOpenAI(also from that package) allows that!
- Even if we add
stream_options=Truein the request, there is no usage return as inOpenAIclass!
- Note that, the changes in Azure OpenAI often come after the official OpenAI’s changes. For example, property
max_completion_tokensreplacesmax_tokensbut it isn’t supported yet in Azure OpenAI.
- Although OpenAI says that
max_completion_tokensis a new way of the dedicatedmax_tokensbut they aren’t the same in my tests. For example, if we setmax_tokens=200for modelgpt-4oandmax_completion_tokens=200for modelo1-mini. The latter will stops without content and finish_reson is “length” (it seems it’s too short to show) when I ask “hello”. The former is good to show the answer.
- Mistral and Claude don’t accept
nullin the request body. OpenAI allows that (not withAzureOpenAI, I tried withstream_options). In OpenAI request, if you inputnull, the default value for that property will be used. However, there will be an error if you putnullin the request body of Mistral.
- Remark: if you use rest api, i’ts
tool_callsin the returned object where it’stoolCallsif you use sdk. The same for others likefinish_reasonorprompt_tokens.
- Differences with streaming enabled between sdk and rest api:
- Be carefule, if you use Mistral REST API, the returned property is
tool_callsbut if you use Mistral SDK API (@mistralai/mistralai), the returned property istoolCallsinstead. If you usetool_callsin SDK, there will be an error!
- Mistral and Claude don’t accept
nullin the request body. OpenAI allows that (not withAzureOpenAI, I tried withstream_options). In OpenAI request, if you inputnull, the default value for that property will be used. However, there will be an error if you putnullin the request body of Mistral.
max_tokensis required!
- It has a very different format in comparison with OpenAI and Mistral (both in input and output).
- Claude doesn’t have an endpoint for getting the list of models like OpenAI/Mistral.
- If you use REST API, you have to indicate version of the API, if you use SDK, you don’t have to.
- First message must use the “user” role. OpenAI/Mistral don’t restrict this rule.
- There are 2 ways to provide the text
- If you use Python SDK, python v3.9 is required!
- Get the API key in Google AI Studio.
- Sau khi có API key rồi thì check Quickstart.
- List of value for safety configs.
- System instruction (prompt)
- Different from other services, the role in the content message is either “user” or “model” (not “assistant”)
- Different from other services, Gemini doesn’t allow the name of a tool starting with special characters like
-.
- “system_instruction” isn’t enabled for model
gemini-1.0-pro, if you use it, there will be an errorDeveloper instruction is not enabled for models/gemini-1.0-pro". In this case, you can use"role": "model"for the instruction.
- Python SDK example doc:
generation_config
- When using embeddings, note that the model name in the API request must start with
models/ortunedModels/. For instance, while the documentation refers totext-embedding-004, you must usemodels/text-embedding-004in your actual request.
models/text-embedding-004has a fixed output dimensionality of 768 - this is hardcoded into the model's architecture and cannot be changed. Theoutput_dimensionalityparameter in your config is simply ignored by this model.
Điều này có thể dẫn đến lỗi nếu như column chỉ chấp nhận 1 chiều nào đó ko khớp. Cách check dimension:
Most services provide their own SDK to get the streaming response. These SDKs work out of the box. However, there are cases where we cannot use the SDKs (for example, I encountered the error
ERR_REQUIRE_ESM when working with the Mistral SDK). In such cases, it's better to work directly with their REST APIs.Check this code.
Check this code.
Remark: The chunk depends heavily and “strangely” on the
chunk_size. Even if we set chunk_size=None to get the full chunk, the return chunk is shown belowIf we count the
data : found in each chunk, it will be 3, 1, 1, …, 1, 2. I still don’t know why!