LogoTop AI Hubs

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama3
Text
Paid

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

Parameters

49B

Context Window

131,072

tokens

Input Price

$0.1

per 1M tokens

Output Price

$0.4

per 1M tokens

Capabilities

Model capabilities and supported modalities

Performance

Reasoning

Excellent reasoning capabilities with strong logical analysis

Math

Strong mathematical capabilities, handles complex calculations well

Coding

Specialized in code generation with excellent programming capabilities

Knowledge

Good knowledge foundation across many domains

Modalities

Input Modalities

text

Output Modalities

text

LLM Price Calculator

Calculate the cost of using this model

$0.000150
$0.001200
Input Cost:$0.000150
Output Cost:$0.001200
Total Cost:$0.001350
Estimated usage: 4,500 tokens

Monthly Cost Estimator

Based on different usage levels

Light Usage
$0.0050
~10 requests
Moderate Usage
$0.0500
~100 requests
Heavy Usage
$0.5000
~1000 requests
Enterprise
$5.0000
~10,000 requests
Note: Estimates based on current token count settings per request.
Last Updated: 1970/01/21