Project: Offline Qwen 3.5 on Android
Turn your old Android phone into a private, offline AI companion.
This guide will show you how to run Qwen 3.5, a cutting-edge open-source model series from Alibaba, directly on your Android device using Termux. Unlike older small models, the Qwen 3.5 small series supports native tool calling, thinking, and multimodal capabilities out of the box. No internet required. No API fees. 100% Private.
Prerequisites
- Android Phone: Any decent Android phone (4GB RAM recommended for the 2B model, but older phones can run the 0.8B version).
- Termux App: A terminal emulator for Android.
- Recommended: Download from F-Droid (The Play Store version is outdated).
Step-by-Step Installation
1. Install & Update Termux
Open Termux and run the following command to ensure your package lists are up to date.
pkg update && pkg upgrade
2. Install Ollama
Termux now has an official package for Ollama, making installation very easy. This is the engine that runs the AI locally.
pkg install ollama
Note: If you see an error saying the package is missing or "no such file," try running
pkg reinstall ollamato fix broken installs.
3. Start the AI Server
Ollama needs a background server to handle the AI logic. Run this command to start it in the background:
ollama serve &
- Tip: If you see logs appear, just press
Enteronce to get your command prompt back. - Important: You must run this command every time you open Termux before using the AI.
Running Qwen 3.5
We will use the Qwen 3.5 small model series. Alibaba released four ultra-efficient sizes: 0.8B, 2B, 4B, and 9B.
- View Qwen tags: ollama.com/library/qwen
- Explore all models: ollama.com/library
For Standard Phones (Recommended)
Use the 2 Billion parameter version. It is the perfect balance of intelligence and speed, running comfortably on devices with just 4GB of RAM.
ollama run qwen3.5:2b
For Older / Low-End Phones
If you have an older device with less RAM, you can use the incredibly tiny 0.8 Billion parameter model.
ollama run qwen3.5:0.8b
For High-End Phones (8GB+ RAM)
If you have a flagship device, you can upgrade to the heavier models for much stronger reasoning and multimodal base capabilities.
# For lightweight agent tasks
ollama run qwen3.5:4b
# For near-desktop performance
ollama run qwen3.5:9b
Troubleshooting
Error: "exec: "serve": executable file not found in $PATH"
- Fix: This is a common Termux bug where the system cannot find the internal server command. Run this one-time fix to create a system link:
ln -s $(which ollama) $PREFIX/bin/serve
Another Fix for this if the above is not working:
ln -s /data/data/com.termux/files/usr/bin/ollama /data/data/com.termux/files/usr/bin/serve
Error: "manifest not found"
- Fix: This means the specific model tag isn't found. Double-check your spelling or search ollama.com for the latest Qwen tags.
Error: "Connection refused" or "Address already in use"
- Fix: The background server isn't running or is stuck. Kill the old process and restart it:
pkill ollama
ollama serve &
The model's responses are too brief or lack reasoning!
- Fix: The 0.8B model is heavily quantized to fit on tiny devices. If you need it to solve complex logic puzzles or execute advanced tool calls, step up to the 2B or 4B models:
ollama run qwen3.5:4b
Why do this?
- Native Tool Calling: Unlike older models that just chat, Qwen 3.5 is built to execute commands and act as a lightweight agent.
- Zero Latency: Get answers instantly without waiting for cloud API calls. You can run this entirely in Airplane mode.
- Accessibility: The 0.8B and 2B models prove that you no longer need thousands of dollars of server hardware to host a highly capable, multimodal AI.