Qwen: Qwen2.5 VL 32B Instruct
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.
Parameters
32B
Context Window
128,000
tokens
Input Price
$0.9
per 1M tokens
Output Price
$0.9
per 1M tokens
Capabilities
Model capabilities and supported modalities
Performance
Excellent reasoning capabilities with strong logical analysis
Strong mathematical capabilities, handles complex calculations well
Specialized in code generation with excellent programming capabilities
-
Modalities
text
text
LLM Price Calculator
Calculate the cost of using this model
Monthly Cost Estimator
Based on different usage levels