Thi's avatar
HomeAboutNotesBlogTopicsToolsReading
About|My sketches |Cooking |Cafe icon Support Thi
💌 [email protected]
✨

Using AI Services / Tools

Anh-Thi Dinh
API & ServicesGenerative AI
Left aside
⚠️
In any case, you should check the official document, this note may be outdated in the future.
🐙 Github repo: any-api-playground

MCP

  • Cursor Directory/MCPs
  • Github Model Context Protocol Servers
  • Install MCP for Claude Desktop: Settings → Developer → Edit Config
  • Install MCP for Cursor:
    • Global: Cursor → Settings… → Cursor Settings → Tools & Integrations
    • Project: .cursor/mcp.json

Using Local Models with IDEs

  • Download LM Studio and download the coder models. Alternatively, you can use Ollama.
  • In your IDE (VSCode or Cursor), install the Continue extension.
  • In LM Studio, navigate to the Developer tab, select your downloaded model → Settings → enable "Serve on Local Network" → enable the server.
  • In your IDE, select the "Continue" tab on the left sidebar → Choose "Or, configure your own model" → "Click here to view more providers" (or select the Ollama icon tab if you're using Ollama) → in the provider list, select LM Studio → Set Model to "Autodetect" → Connect → a config file will open at ~/.continue/config.yaml, keep the default settings and save.
  • That's it!
  • As another option, you can use Granite.code (from IBM)

Cursor IDE

  • Note: VSCode.
  • Official docs.
  • Cursor Daskboard.
  • Different from VSCode, all cmd+k is replaced by cmd+r!
  • Using the CLI commands.

No need to remember commands using AI

I’m using Claude Code, if you use another Coding CLI service, modify the codes. Insert below codes in .bashrc or .zshrc and then source ~/.zshrc:

Claude Code & GLM

General

  • ❤️ Alternative to (and compatible with) Claude Code: Z.ai's GLM Coding, which is much cheaper.
  • Instruction prompt: Add instructions to ~/.claude/CLAUDE.md for global guides (for any project). For a single project, add CLAUDE.md in the root of the project!
  • You can copy texts and even screenshots to clipboard and then “paste” to the current Claude code command.
    • Or using Puppeteer to automate feedback (no need to take screenshots by yourself). ← Using chrome-devtools-mcp or Playwright MCP server
    • You can drag and drop images directly into the prompt input.
  • When coding CLI with Claude, if you want to add something to CLAUDE.md, just add # before what you want to say, e.g. # Response in Vietnamese every question.
  • Tab to autocomplete filenames.
  • ESC to interrupt a running processing.
  • Context mangement: look at the phrase “Context left until auto-compact: 16%” at the bottom-right of the command input.
  • claude --dangerously-skip-permissions = bypass all permission checks and let Claude work uninterrupted until completion
  • Using /clear to reset the context window.
  • Using \ and then enter to write multiple lines in the input. ← /terminal-setup to tell Claude install the terminal Shift+Enter key binding.
  • shift+tab to toggle between plan mode (ask for confirmation) and auto-accept mode (no confirmation)

Setting Up GLM with Claude Code

  • Check the official guide but it’s not enough.
  • The official Claude Code Extension
  • Modify your ~/.zshrc or ~/.bashrc
    • Then source ~/.zshrc or source ~/.bashrc in the current terminal to make them work!
  • To switch between services, run vb_glm to use GLM or vb_claude to use default Claude Code.
  • Verify the configuration by typing claude in the terminal, then running /status.
    • You can also simply ask "Who r u? Which model r u?"
  • To make GLM work with the latest VSCode extension: open the terminal, switch to GLM with vb_glm, then open the current folder using cursor . or code ..
    • Test the Claude Code extension by asking: "Who r u? Which model r u?" (you may need to ask several times until you see an answer containing "glm-4.6")
    • ⭐ Another way: Open IDE Settings → search for "Claude Code" → Click to open the settings.json file and add the following:
      • Then reload the current IDE windows.
        ⚠️ Note that "default" for "selectedModel" will not work! You can also type /model and then select "opus".

Useful rules

Single instruction for all Code Assistant services

  • Benefit: single source-of-truth for all.
  • Create ai-instructions.md in the root (or somewhere else).
  • For Claude Code: in .claude/settings.json (not .claude/settings.local.json)
  • For Cursor: in .cursor/rules/general.mdc
  • For Github Copilot: in .github/copilot-instructions.md
    • In .vscode/settings.json

Remarks (for all services)

  • Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.
  • Only OpenAI has user and n parameters.

Temperature

ㅤ
OpenAI
Mistral
Claude
Gemini
Range
[0, 2]
[0, 1]
[0, 1]
[0, 2] (gemini-1.5-pro)
[0, 1] (
gemini-1.0-pro-vision)
[0, 2] (
gemini-1.0-pro-002)
[0, 1] (
gemini-1.0-pro-001)
Default
1
0.7
1
1 (gemini-1.5-pro)
0.4 (
gemini-1.0-pro-vision)
1 (
gemini-1.0-pro-002)
0.9 (
gemini-1.0-pro-001)

Max tokens

No services give a very clear discription about the max tokens. You have to read the documentation carefully.
  • Claude gives the max output to 4096 tokens to all models.
  • Mistral (they write 32k, 8k,…)
  • OpenAI sometimes give 4096 tokens for the output.
  • Gemini

OpenAI

  • Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.
  • Prompt caching (auto enabled) → best practices

AzureOpenAI

There are some remarks (tried with version 1.40.8 and below):
  • It doesn’t accept null (when you use REST API) or None (when using Python SDK) with some specific endpoints (whereas some don’t have this kind of issue). I tried with stream_options. However, the class OpenAI (also from that package) allows that!
  • Even if we add stream_options=True in the request, there is no usage return as in OpenAI class!
  • Note that, the changes in Azure OpenAI often come after the official OpenAI’s changes. For example, property max_completion_tokens replaces max_tokens but it isn’t supported yet in Azure OpenAI.
  • Although OpenAI says that max_completion_tokens is a new way of the dedicated max_tokens but they aren’t the same in my tests. For example, if we set max_tokens=200 for model gpt-4o and max_completion_tokens=200 for model o1-mini. The latter will stops without content and finish_reson is “length” (it seems it’s too short to show) when I ask “hello”. The former is good to show the answer.

Mistral

  • Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.
  • Remark: if you use rest api, i’ts tool_calls in the returned object where it’s toolCalls if you use sdk. The same for others like finish_reason or prompt_tokens.
  • Differences with streaming enabled between sdk and rest api:
  • Be carefule, if you use Mistral REST API, the returned property is tool_calls but if you use Mistral SDK API (@mistralai/mistralai), the returned property is toolCalls instead. If you use tool_calls in SDK, there will be an error!

Claude

  • Mistral and Claude don’t accept null in the request body. OpenAI allows that (not with AzureOpenAI, I tried with stream_options). In OpenAI request, if you input null, the default value for that property will be used. However, there will be an error if you put null in the request body of Mistral.
  • max_tokens is required!
  • It has a very different format in comparison with OpenAI and Mistral (both in input and output).
  • Claude doesn’t have an endpoint for getting the list of models like OpenAI/Mistral.
  • If you use REST API, you have to indicate version of the API, if you use SDK, you don’t have to.
  • First message must use the “user” role. OpenAI/Mistral don’t restrict this rule.
  • There are 2 ways to provide the text

Gemini

  • If you use Python SDK, python v3.9 is required!
  • Get the API key in Google AI Studio.
  • Sau khi có API key rồi thì check Quickstart.
  • List of value for safety configs.
  • System instruction (prompt)
  • Different from other services, the role in the content message is either “user” or “model” (not “assistant”)
  • Different from other services, Gemini doesn’t allow the name of a tool starting with special characters like -.
  • “system_instruction” isn’t enabled for model gemini-1.0-pro, if you use it, there will be an error Developer instruction is not enabled for models/gemini-1.0-pro". In this case, you can use "role": "model" for the instruction.
  • Python SDK example doc: generation_config
  • When using embeddings, note that the model name in the API request must start with models/ or tunedModels/. For instance, while the documentation refers to text-embedding-004, you must use models/text-embedding-004 in your actual request.
  • models/text-embedding-004 has a fixed output dimensionality of 768 - this is hardcoded into the model's architecture and cannot be changed. The output_dimensionality parameter in your config is simply ignored by this model.
    • Điều này có thể dẫn đến lỗi nếu như column chỉ chấp nhận 1 chiều nào đó ko khớp. Cách check dimension:

xAI (Grok)

  • xai-cookbook repo

Streaming response with REST API

Most services provide their own SDK to get the streaming response. These SDKs work out of the box. However, there are cases where we cannot use the SDKs (for example, I encountered the error ERR_REQUIRE_ESM when working with the Mistral SDK). In such cases, it's better to work directly with their REST APIs.

Using fetch

Check this code.

Using axios

Check this code.

Python

⚠️
Remark: The chunk depends heavily and “strangely” on the chunk_size. Even if we set chunk_size=None to get the full chunk, the return chunk is shown below
If we count the data : found in each chunk, it will be 3, 1, 1, …, 1, 2. I still don’t know why!

Troubleshooting

 
◆MCP◆Using Local Models with IDEs◆Cursor IDE◆No need to remember commands using AI◆Claude Code & GLM○General○Setting Up GLM with Claude Code○Useful rules◆Single instruction for all Code Assistant services◆Remarks (for all services)◆Temperature◆Max tokens◆OpenAI○AzureOpenAI◆Mistral◆Claude◆Gemini◆xAI (Grok)◆Streaming response with REST API○Using fetch○Using axios◆Python◆Troubleshooting
About|My sketches |Cooking |Cafe icon Support Thi
💌 [email protected]
1claude_execute() {
2  emulate -L zsh
3  setopt NO_GLOB
4  local query="$*"
5  local prompt="You are a command line expert. The user wants to run a command but they don't know how. Here is what they asked: ${query}. Return ONLY the exact shell command needed. Do not prepend with an explanation, no markdown, no code blocks - just return the raw command you think will solve their query."
6  local cmd
7  # use Claude Code
8  cmd=$(claude --dangerously-skip-permissions --disallowedTools "Bash(*)" --model default -p "$prompt" --output-format text | tr -d '\000-\037' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
9  if [[ -z "$cmd" ]]; then
10    echo "claude_execute: No command found"
11    return 1
12  fi
13  echo -e "$ \033[0;36m$cmd\033[0m"
14  eval "$cmd"
15}
16alias ask="noglob claude_execute"
1# Usage
2ask "List all conda env in this computer"
1export GCL_API_KEY="xxx"
2# GLM Setup
3vb_glm() {
4  echo "🚀 Using GLM for claude code"
5  export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
6  export ANTHROPIC_AUTH_TOKEN=$GCL_API_KEY
7  export ANTHROPIC_DEFAULT_OPUS_MODEL="GLM-4.6"
8  export ANTHROPIC_DEFAULT_SONNET_MODEL="GLM-4.6"
9  export ANTHROPIC_DEFAULT_HAIKU_MODEL="GLM-4.5-Air"
10  echo "✅ Done"
11}
12# Claude Setup (unset the variables)
13vb_claude() {
14  echo "🚀 Using default claude for claude code"
15  unset ANTHROPIC_BASE_URL
16  unset ANTHROPIC_AUTH_TOKEN
17  unset ANTHROPIC_DEFAULT_OPUS_MODEL
18  unset ANTHROPIC_DEFAULT_SONNET_MODEL
19  unset ANTHROPIC_DEFAULT_HAIKU_MODEL
20  echo "✅ Done"
21}
1"claude-code.environmentVariables": [
2    {
3        "name": "ANTHROPIC_AUTH_TOKEN",
4        "value": "xxx"
5    },
6    {
7        "name": "ANTHROPIC_BASE_URL",
8        "value": "https://api.z.ai/api/anthropic"
9    },
10    {
11        "name": "ANTHROPIC_DEFAULT_OPUS_MODEL",
12        "value": "glm-4.6"
13    },
14    {
15      "name": "ANTHROPIC_DEFAULT_SONNET_MODEL",
16      "value": "glm-4.6"
17  },
18    {
19        "name": "ANTHROPIC_DEFAULT_HAIKU_MODEL",
20        "value": "glm-4.5-air"
21    }
22],
23"claude-code.selectedModel": "opus"
1- **[IMPORTANT]** Do not just simulate the implementation or mocking them, always implement the real code.
2- Use file system (in markdown format) to hand over reports in `./plans/reports` directory from agent to agent with this file name format: `NNN-from-agent-name-to-agent-name-task-name-report.md`.
3
4**Task Completeness Verification**
5- Verify all tasks in the TODO list of the given plan are completed
6- Check for any remaining TODO comments
7- Update the given plan file with task status and next steps
1{
2  "hooks": {
3    "SessionStart": [
4      {
5        "hooks": [
6          {
7            "type": "command",
8            "command": "cat ./docs/ai-instructions.md"
9          }
10        ]
11      }
12    ]
13  }
14}
1---
2description: General rules for the project
3globs:
4alwaysApply: true
5---
6
7- Always follow the guidelines in our AI instructions file @./docs/ai-instructions.md
1Always follow the guidelines in our AI instructions file @./ai-instructions.md
1{
2	"chat.instructionsFilesLocations": {
3    "docs/ai-instructions.md": true
4  }
5}
1// rest api
2chunk?.choices?.[0]?.delta?.content
3
4// sdk
5chunk?.data?.choices?.[0]?.delta?.content
1"messages": [
2	{
3		"role": "user",
4		"content": "some text"
5	},
6	{
7      "type": "tool_result",
8      "tool_use_id": "toolu_01Q4E1SvgCuAB4KvHMsHFVFD",
9      "content": "Based on your information, an username is automatically generated as dieu_dinh_vietnam"
10  },
11]
12
13// or
14"messages": [
15	{
16		"role": "user",
17		"content": [ { "type": "text", "text": "some text" } ]
18	},
19	{
20      "type": "tool_result",
21      "tool_use_id": "toolu_01DSntXabEn5gNmnnPCYDvsa",
22      "content": [
23          {
24              "type": "text",
25              "text": "Your password is the last 3 digits of your phone number: 678"
26          }
27      ]
28  }
29]
1{
2	"contents": [
3		{
4			"role": "model", 
5			"parts": { "text": "You are a very helpful assistant!" } 
6		},
7		{
8			"role": "user", 
9			"parts": { "text": "Hello, who are you?" } 
10		}
11	]
12}
1SELECT DISTINCT vector_dims(embedding) as dimension_count
2FROM docembeddings 
3WHERE embedding IS NOT NULL;
1request_body = {
2    'model': 'casperhansen/llama-3-70b-instruct-awq',
3    'messages': [
4        {'role': 'user', 'content': 'count from 1 to 2, separated by commas'}
5    ],
6    'stream': True,
7    'stream_options': {"include_usage": True}
8}
9
10headers = {
11    'Authorization': 'Bearer ' + api_key,
12    'Content-Type': 'application/json'
13}
14
15stream = True
16with requests.post(url, data=json.dumps(request_body), stream=stream, headers=headers) as response:
17    if response.status_code == 200:
18        for chunk in response.iter_content(chunk_size=1024):
19            if chunk and chunk.startswith(b'data: '):
20                print(chunk.decode('utf-8'))
21    else:
22        print(f"Failed to connect: {response.status_code}")
1index: 0
2data: xxxxx
3data: xxxxxx
4data: xxxx
5
6index: 1
7data: xxxx
8
9index: 2
10data: xxxx
11data: [DONE]
I don't understand (yet) why the chunk in the for loop returned like that. That's why we cannot handle it normally by parsing the JSON format for each chunk. Instead, one idea is to use .search with regex to get the format and value we want.