Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

16-Feb-2026 mpost.io

Alibaba Cloud’s Qwen team has introduced the first model in its new Qwen3.5 series, unveiling the open‑weight Qwen3.5‑397B‑A17B.

Positioned as a native vision‑language system, the model delivers strong performance across reasoning, coding, agent tasks, and multimodal understanding, reflecting a significant advance in the company’s large‑scale AI development efforts.

The model is built on a hybrid architecture that combines linear attention through Gated Delta Networks with a sparse mixture‑of‑experts design, enabling high efficiency during inference. Although the full system contains 397 billion parameters, only 17 billion are activated for each forward pass, allowing it to maintain high capability while reducing computational cost. The release also expands language and dialect coverage from 119 to 201, broadening accessibility for users and developers worldwide.

Qwen3.5 Marks A Major Leap In Reinforcement Learning And Pretraining Efficiency

The Qwen3.5 series introduces substantial gains over Qwen3, driven largely by extensive reinforcement learning scaling across a wide range of environments. Rather than optimizing for narrow benchmarks, the team focused on increasing task difficulty and generalizability, resulting in improved agent performance across evaluations such as BFCL‑V4, VITA‑Bench, DeepPlanning, Tool‑Decathlon, and MCP‑Mark. Additional results will be detailed in an upcoming technical report.

Pretraining improvements span power, efficiency, and versatility. Qwen3.5 is trained on a significantly larger volume of visual‑text data with strengthened multilingual, STEM, and reasoning content, enabling it to match the performance of earlier trillion‑parameter models. Architectural upgrades—including higher‑sparsity MoE, hybrid attention, stability refinements, and multi‑token prediction—deliver major throughput gains, particularly at extended context lengths of 32k and 256k tokens. The model’s multimodal capabilities are strengthened through early text‑vision fusion and expanded datasets covering images, STEM materials, and video, while a larger 250k vocabulary improves encoding and decoding efficiency across most languages.

The infrastructure behind Qwen3.5 is designed for efficient multimodal training. A heterogeneous parallelism strategy separates vision and language components to avoid bottlenecks, while sparse activation enables near‑full throughput even on mixed text‑image‑video workloads. A native FP8 pipeline reduces activation memory by roughly half and increases training speed by more than 10 percent, maintaining stability at massive token scales.

Reinforcement learning is supported by a fully asynchronous framework capable of handling models of all sizes, improving hardware utilization, load balancing, and fault recovery. Techniques such as FP8 end‑to‑end training, speculative decoding, rollout router replay, and multi‑turn rollout locking help maintain consistency and reduce gradient staleness. The system is built to support large‑scale agent workflows, enabling seamless multi‑turn interactions and broad generalization across environments.

Users can interact with Qwen3.5 through Qwen Chat, which offers Auto, Thinking, and Fast modes depending on the task. The model is also available through Alibaba Cloud’s ModelStudio, where advanced features such as reasoning, web search, and code execution can be enabled through simple parameters. Integration with third‑party coding tools allows developers to adopt Qwen3.5 into existing workflows with minimal friction.

According to the Qwen team, Qwen3.5 establishes a foundation for universal digital agents through its hybrid architecture and native multimodal reasoning. Future development will focus on system‑level integration, including persistent memory for cross‑session learning, embodied interfaces for real‑world interaction, self‑directed improvement mechanisms, and economic awareness for long‑term autonomous operation. The objective is to move beyond task‑specific assistants toward coherent, persistent agents capable of managing complex, multi‑day objectives with reliable, human‑aligned judgment.

The post Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance appeared first on Metaverse Post.

Also read: Best AI Stocks 2026: NVIDIA, Microsoft, Alphabet Top the List

Previous Next

WHAT'S YOUR OPINION?

Article Details

Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

Qwen3.5 Marks A Major Leap In Reinforcement Learning And Pretraining Efficiency