2w ago

Alibaba releases QwQ-32B, an open-source reasoning model, on Hugging Face and ModelScope, claiming performance similar to DeepSeek-R1 with lower compute needs.

qwenlm.github.io QwQ-32B: Embracing the Power of Reinforcement Learning

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For in...

Technology @lemmy.ml

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

2w ago

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

qwenlm.github.io /blog/qwq-32b/

6 10

Technology @lemmygrad.ml

☆ Yσɠƚԋσʂ ☆ @lemmygrad.ml

2w ago

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

qwenlm.github.io /blog/qwq-32b/

9 3

No comments