This qwen2 think-rl model was trained using DeepSeek GRPO to perform thinking / reasoning before provide an answer
8-bit