Nvidia Nemotron Nano v2 AWQ
Collection
Nvidia Nemotron Nano v2 Quantize to AWQ by using LLM Compressor.
Used Gemini-3-Pro-Preview as Dataset
•
1 item
•
Updated
This model NVIDIA-Nemotron-Nano-12B-v2-AWQ was converted to AWQ format, from nvidia/NVIDIA-Nemotron-Nano-12B-v2 using llm-compressor version 0.9.0.1 (https://github.com/vllm-project/llm-compressor.git). With using dataset Gemini 3 Pro Preview from TeichAI/gemini-3-pro-preview-high-reasoning-1000x
hf download nicklas373/NVIDIA-Nemotron-Nano-12B-v2-AWQ
vllm serve nicklas373/NVIDIA-Nemotron-Nano-12B-v2-AWQ \
--chat-template '/home/xxx/.cache/huggingface/hub/models--nicklas373--NVIDIA-Nemotron-Nano-12B-v2-AWQ/snapshots/HASH_CODE/chat_template.jinja' \
--chat-template-content-format string \
--disable-fastapi-docs \
--dtype auto \
--enable-auto-tool-choice \
--mamba_ssm_cache_dtype float32 \
--quantization compressed-tensors \
--served-model-name NVIDIA-Nemotron-Nano-12B-v2-AWQ \
--seed 0 \
--tool-call-parser 'nemotron_json' \
--tool-parser-plugin '/home/xxx/.cache/huggingface/hub/models--nicklas373--NVIDIA-Nemotron-Nano-12B-v2-AWQ/snapshots/HASH_CODE/nemotron_toolcall_parser_streaming.py' \
--tokenizer 'nvidia/NVIDIA-Nemotron-Nano-12B-v2' \
--trust-remote-code
Base model
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base