Beginner Guide | Local Deployment of the gpt-oss-120b Model - System Flashing

Beginner Guide | Local Deployment of the gpt-oss-120b Model

2026-05-21 0

gpt-oss-120b is a high-performance AI open-source large model released by OpenAI in August this year. It is designed for complex reasoning scenarios and supports local inference on edge devices.

If you need to run this model on Thor, you can contact sales for clarification. The device will come pre-installed with the large model at the factory, allowing you to use it out of the box after power-on.

Next, we will guide you step-by-step on how to deploy and run the gpt-oss-120b model locally on the self-developed Jetson Thor series embodied AI computing platform (Y-C28-DEV / Thor-28F1).

01 Deploying the gpt-oss-120b Model

Use Docker (Ollama) for local deployment of the gpt-oss-120b model.

Step 1: Power on the device

Connect display, input devices, and network cable to the Y-C28-DEV / Thor-28F1, then power it on. Unless otherwise specified, the system is a clean installation, and software needs to be installed manually. Use the following commands to install JetPack 7:

sudo apt update
sudo apt install nvidia-jetpack

Step 2: Install Docker

sudo apt update
sudo apt install -y nvidia-container curl
curl https://get.docker.com | sh && sudo systemctl --now enable docker
sudo nvidia-ctk runtime configure --runtime=docker

Step 3: Install Ollama in Docker

mkdir ~/ollama-data/
sudo docker run -it -p 11434:11434 --runtime=nvidia --name ollama -v ${HOME}/ollama-data:/data ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04

Command explanation:

--runtime=nvidia Enables GPU acceleration inside the container

-p 11434:11434 maps port 11434 inside the container (Ollama default port) to port 11434 on the host. This is required if you want to use Cherry Studio to call Ollama. If not needed, you can remove this parameter.

-v ${HOME}/ollama-data:/data mounts the host directory ${HOME}/ollama-data into /data inside the container, for persistent storage of Ollama models and configuration. Data will not be lost after the container is removed. Adjust as needed.

Step 4: Pull and run the model

ollama pull gpt-oss:120b
ollama run --verbose gpt-oss:120b

Speed: 25 tokens/s. This is a normal GPU-accelerated speed. You can adjust parameters to increase performance if needed.

Exit conversation:

/bye

Exit container:

exit

Restart container:

sudo docker restart ollama

Re-enter container:

sudo docker exec -it ollama /bin/bash

02 Install Cherry Studio Software

After completing the above steps, the gpt-oss-120b model is deployed and can be used via command line.

For easier usage, we provide a graphical tool—Cherry Studio. It uses multi-model aggregation technology, local deployment solutions, and end-to-end automation to reshape human-computer interaction and significantly improve complex task efficiency. Below is the installation guide.

Download the ARM version from the official website, transfer it to the device, and install it using:

chmod +x Cherry-Studio-1.5.9-arm64.deb
sudo dpkg -i Cherry-Studio-1.5.9-arm64.deb

Open Cherry Studio from the bottom-left menu, go to Settings → Ollama, and you will see the installed gpt-oss-120b model. Select it, return to the home screen, and start using it.

03 Common Issues

Insufficient Memory:

1. Run sudo jtop

2. Press 4

3. Press c to clear cache

4. Press q to exit

Beginner Guide | Local Deployment of the gpt-oss-120b Model

01 Deploying the gpt-oss-120b Model

Step 1: Power on the device

Step 4: Pull and run the model

02 Install Cherry Studio Software

03 Common Issues

Categories

Article Recommendation

Product Recommendations