Reasoning AI Fits in 900MB RAM for Smartphones
Liquid AI's LFM2.5-1.2B-Thinking brings chain-of-thought reasoning to smartphones with just 900MB RAM, enabling step-by-step problem-solving on edge devices
Reasoning AI Now Runs on Phones (900MB RAM)
What It Is
Liquid AI released LFM2.5-1.2B-Thinking, a reasoning model that performs internal chain-of-thought processing while consuming just 900MB of RAM. Unlike standard language models that generate immediate responses, this model works through problems step-by-step internally before producing an answer - the same approach OpenAI’s o1 uses, but compressed into a package small enough for smartphones and edge devices.
The model belongs to Liquid AI’s LFM (Liquid Foundation Model) family and represents a significant compression achievement. At 1.2 billion parameters, it delivers reasoning capabilities that previously required models several times larger. The “Thinking” designation refers to its ability to generate internal reasoning traces that guide problem-solving, rather than relying purely on pattern matching from training data.
Developers can access the model through Hugging Face at https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking, test it via Liquid AI’s playground at https://playground.liquid.ai/login?callbackUrl=%2F, or deploy it through their LEAP platform at https://leap.liquid.ai/models?model=lfm2.5-1.2b-thinking.
Why It Matters
This release fundamentally changes where reasoning AI can operate. Applications that previously required cloud API calls or powerful workstations can now run entirely on-device. Privacy-sensitive use cases - medical diagnostics, legal analysis, financial planning - gain a viable local processing option that keeps data off external servers.
The performance metrics tell an interesting story. Despite being 40% smaller than Qwen3-1.7B, LFM2.5-1.2B-Thinking outperforms it across most benchmarks. The model shows particular strength in mathematical reasoning and tool use, domains where chain-of-thought processing provides clear advantages over direct response generation.
Mobile developers gain access to sophisticated reasoning without the latency, cost, or connectivity requirements of cloud-based solutions. Edge computing scenarios - robotics, IoT devices, offline applications - become viable targets for AI features that previously demanded constant internet access. The 900MB footprint means the model fits comfortably alongside other applications on modern smartphones without monopolizing resources.
Getting Started
The simplest path involves testing through Liquid AI’s web playground. After creating an account, developers can experiment with different prompts to observe how the model handles multi-step reasoning tasks.
For local deployment, the Hugging Face model repository provides standard integration:
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Thinking")
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Thinking")
prompt = "If a train travels 120 miles in 2 hours, then speeds up to travel 180 miles in the next 2 hours, what was its average speed?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
The LEAP platform offers managed deployment for production applications, handling scaling and optimization automatically. Teams building mobile apps can integrate the model using standard ONNX runtime or platform-specific inference engines.
Context
LFM2.5-1.2B-Thinking competes in a crowded field of small language models. Microsoft’s Phi-3-mini (3.8B parameters) and Google’s Gemma 2B occupy similar territory, but neither implements internal reasoning chains at this size. The closest comparison comes from Qwen’s QwQ-32B-Preview, which offers reasoning capabilities but requires substantially more resources.
The tradeoff involves capability versus efficiency. While larger models like Claude or GPT-4 handle more complex reasoning tasks, they demand cloud infrastructure. LFM2.5-1.2B-Thinking sacrifices some sophistication for deployment flexibility. Complex multi-hop reasoning or extensive context windows remain challenging at this scale.
Limitations include reduced performance on tasks requiring broad world knowledge or nuanced language understanding. The model excels at structured problems - mathematics, logic puzzles, code generation - but struggles with open-ended creative tasks or subtle contextual interpretation. Developers should benchmark against their specific use cases rather than assuming universal applicability.
What seemed impossible two years ago - running reasoning models on consumer hardware - now represents a practical deployment option. The gap between research capabilities and edge-device reality continues narrowing.
Related Tips
Skyfall 31B v4.2: Uncensored Roleplay AI Model
Skyfall 31B v4.2 is an uncensored roleplay AI model designed for creative storytelling and character interactions without content restrictions, offering users
CoPaw-Flash-9B Matches Larger Model Performance
CoPaw-Flash-9B, a 9-billion parameter model from Alibaba's AgentScope team, achieves benchmark performance remarkably close to the much larger Qwen3.5-Plus,
Intel Arc Pro B70: 32GB VRAM AI Workstation GPU at $949
Intel's Arc Pro B70 workstation GPU offers 32GB of VRAM at $949, creating an unexpected value proposition for AI developers working with large language models