Beginner Guide | Local Deployment of the gpt-oss-120b Model


gpt-oss-120b is a high-performance AI open-source large model released by OpenAI in August this year. It is designed for complex reasoning scenarios and supports local inference on edge devices.

If you need to run this model on Thor, you can contact sales for clarification. The device will come pre-installed with the large model at the factory, allowing you to use it out of the box after power-on.

1926171be30c9f181a5c382813dba42c.jpg

Next, we will guide you step-by-step on how to deploy and run the gpt-oss-120b model locally on the self-developed Jetson Thor series embodied AI computing platform (Y-C28-DEV / Thor-28F1).

01 Deploying the gpt-oss-120b Model

Use Docker (Ollama) for local deployment of the gpt-oss-120b model.

Step 1: Power on the device

Connect display, input devices, and network cable to the Y-C28-DEV / Thor-28F1, then power it on. Unless otherwise specified, the system is a clean installation, and software needs to be installed manually. Use the following commands to install JetPack 7:

sudo apt update
sudo apt install nvidia-jetpack

Step 2: Install Docker

sudo apt update
sudo apt install -y nvidia-container curl
curl https://get.docker.com | sh && sudo systemctl --now enable docker
sudo nvidia-ctk runtime configure --runtime=docker

Step 3: Install Ollama in Docker

mkdir ~/ollama-data/
sudo docker run -it -p 11434:11434 --runtime=nvidia --name ollama -v ${HOME}/ollama-data:/data ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04

3065e75bcc2998fd63ad997370bbf8c2.png

Command explanation:

--runtime=nvidia  Enables GPU acceleration inside the container

-p 11434:11434 maps port 11434 inside the container (Ollama default port) to port 11434 on the host. This is required if you want to use Cherry Studio to call Ollama. If not needed, you can remove this parameter.

-v ${HOME}/ollama-data:/data mounts the host directory ${HOME}/ollama-data into /data inside the container, for persistent storage of Ollama models and configuration. Data will not be lost after the container is removed. Adjust as needed.

Step 4: Pull and run the model

ollama pull gpt-oss:120b
ollama run --verbose gpt-oss:120b

f2d64164e9db055c3226f92ac440a758.png

Speed: 25 tokens/s. This is a normal GPU-accelerated speed. You can adjust parameters to increase performance if needed.

Exit conversation:

/bye

Exit container:

exit

Restart container:

sudo docker restart ollama

Re-enter container:

sudo docker exec -it ollama /bin/bash

02 Install Cherry Studio Software

After completing the above steps, the gpt-oss-120b model is deployed and can be used via command line.

For easier usage, we provide a graphical tool—Cherry Studio. It uses multi-model aggregation technology, local deployment solutions, and end-to-end automation to reshape human-computer interaction and significantly improve complex task efficiency. Below is the installation guide.

Download the ARM version from the official website, transfer it to the device, and install it using:

chmod +x Cherry-Studio-1.5.9-arm64.deb
sudo dpkg -i Cherry-Studio-1.5.9-arm64.deb

Open Cherry Studio from the bottom-left menu, go to Settings → Ollama, and you will see the installed gpt-oss-120b model. Select it, return to the home screen, and start using it.

03 Common Issues

Insufficient Memory:

1. Run sudo jtop

2. Press 4

3. Press c to clear cache

4. Press q to exit

Product Recommendations

301-DEV
301-DEV

176 TOPS INT8, 48GB LPDDR4X, 128GB eMMC

AIV3X
AIV3X

176 TOPS INT8, 48GB LPDDR4X, 128GB EMMC

28F2E4
28F2E4

Jetson™ AGX Orin

18F1E1
18F1E1

Jetson™ Orin NX丨Orin Nano

Download Documentation
Custom Solutions
Pre-Sales Inquiry
After-Sales Support
Contact Us