GLM 4.5: Open-Weights Model Aimed Squarely at Agents and Long Context | by Fariha Batool | Medium
Sign up
Get app
Sign up

GLM 4.5: Open-Weights Model Aimed Squarely at Agents and Long Context
Follow
2 min read
·
Jul 31, 2025
6
1
Share
On 28 July 2025 Z.ai (the company formerly known as Zhipu AI) released two fully open Mixture-of-Experts (MoE) checkpoints under an MIT licence:
GLM-4.5–355B-A32B — 355 B total parameters, 32 B active per token
GLM-4.5-Air-106B-A12B — cost-sensitive sibling, 106 B total / 12 B active
Both appear on Hugging Face and ModelScope in BF16, FP8 and INT formats; all 52 internal evaluation traces are public.
Why this release is notable
1 .Frontier-level, benchmarked quality
Internal runs across 12 public suites place the flagship model third overall (trailing only OpenAI o3 and Grok 4) and the Air variant sixth. Highlights:
- Tool use: matches or beats Claude 4 Sonnet on TAU-Bench Retail & Airline, beats Claude 4 Opus on BrowseComp
- Reasoning: MMLU-Pro 84.6 %, AIME-24 91 %, MATH-500 98 %
- Coding: SWE-bench Verified 64.2 %, Terminal-Bench 37.5 % — without external plugins
2 Agent-centric architecture
- 128 k context window
- Native function calling
- Dual thinking / non-thinking modes to balance deep chain-of-thought with fast chat
3 Built-in speculative decoding
A new Multi-Token Prediction (MTP) layer drafts several tokens inside a single forward pass, no external “draft” model. The same pass also verifies the draft, yielding lower latency on mixed CPU-GPU rigs.
4 Transparent release artefacts
Weights, evaluation trajectories, and vLLM / SGLang guides ship together. GLM 4.5 is also only the second frontier model to validate the Muon optimiser at massive scale.
A deliberately broad prompt test
I ran the same wide but imprecise prompt against GLM 4.5 and OpenAI o3 to gauge coverage:
Let’s think step by step. Explain Azure Bot Framework and Custom Engine Agents—include App Registration, Redis, Application Insights, Key Vault, Web App, Tunnel Relay… Be very detailed.
GLM 4.5
- Returned a sectioned, exhaustive explanation
- Embedded code snippets (C#, Python) for Key Vault, Redis, App Insights
- Auto-generated a Mermaid graph of the end-to-end architecture; the Graph TD rendered cleanly, refreshable SVG download included
o3
- Produced a shorter outline
- Gave a Graph TD block that failed to render in the Mermaid sandbox
- No embedded code
For documentation or diagram-heavy workflows, that’s a practical win for GLM 4.5. You can reproduce the test here:
Chat with Z.ai — Free AI for Presentations, Writing & Coding
What’s still missing
- Third-party replication:all scores are vendor-reported; no peer-reviewed paper yet
- Hardware footprint:the 355 B model still needs multi-GPU servers in BF16, lightweight GGUF/GGML quantisations are not yet published
- Dual-use debate:MIT-licensed frontier weights re-ignite questions around bio-risk and misinformation controls
Bottom line
GLM 4.5 blends large-scale MoE, agent-friendly features and built-in speculative decoding while keeping everything under an open license. If the internal numbers hold up, it could become a go-to base for multilingual coding agents and long-context RAG pipelines. Another push in the East-West race toward truly general-purpose, tool-using LLMs.
Get Fariha Batool’s stories in your inbox
Join Medium for free to get updates from this writer.
Subscribe
Subscribe
- [x]
Remember me for faster sign in
Have you benchmarked it yet? Comments are most welcome!
6
6
1
Follow
Written by Fariha Batool
AI enthusiast turning generative AI ideas into product features. I write bite‑sized notes on AI tools and breakthroughs, usually with a cup of coffee in hand.
Follow
Responses (1)

Write a response
Cancel
Respond
Interesting article. GLM 4.5’s agentic architecture and speculative decoding really stand out.
Reply
More from Fariha Batool

I tried Gemini 2.5 Flash Image ### I spent some time putting Gemini 2.5 Flash Image Preview through real work. As a straight image generator, it feels excellent. Scenes come…
Aug 27, 2025

The AI Agents Are Here, and They’re Already Building Their Own Society ### Something extraordinary happened in the last week of January 2026. An open-source AI assistant called OpenClaw garnered over 100,000 GitHub…
Feb 1
How Mixture‑of‑Experts (MoE) supercharges Today’s Frontier Language Models ### Recent public releases such as Kimi K2 (≈ 1 T total parameters, 32 B active) and GLM 4.5 (355 B total, 32 B active) highlight a sharp…
Jul 29, 2025
From Idea to Research Pack in Under 15 Minutes ### ChatGPT’s new Agent mode doesn’t just answer questions — it opens its own browser and terminal, clicks around, writes files, and then…
Jul 27, 2025
Recommended from Medium

In
by
What Claude Design actually changes for designers ### The handoff problem is finally getting solved — and it’s happening faster than most of us expected.
Apr 20

In
by
As a Neuroscientist, I Quit These 5 Morning Habits That Destroy Your Brain ### Most people do #1 within 10 minutes of waking (and it sabotages your entire day)
Jan 14

I Failed Uber’s System Design Interview Last Month. Here’s Every Question They Asked. ### It was much harder and the rejection email taught me more than any LeetCode grind ever could.
Feb 20

Vibe Coding is OVER. ### Here’s What Comes Next.
Mar 24

In
by
If You Understand These 5 AI Terms, You’re Ahead of 90% of People ### Master the core ideas behind AI without getting lost
Mar 29

In
by
AI Agents: Complete Course ### From beginner to intermediate to production.
Dec 6, 2025









