Configuration Parsing Warning: Invalid JSON for config file config.json

NVIDIA-Nemotron-Nano-12B-v2-AWQ

Model Highlights

This model NVIDIA-Nemotron-Nano-12B-v2-AWQ was converted to AWQ format, from nvidia/NVIDIA-Nemotron-Nano-12B-v2 using llm-compressor version 0.9.0.1 (https://github.com/vllm-project/llm-compressor.git). With using dataset Gemini 3 Pro Preview from TeichAI/gemini-3-pro-preview-high-reasoning-1000x

Datasets:

TeichAI/gemini-3-pro-preview-high-reasoning-1000x

Base Model:

nvidia/NVIDIA-Nemotron-Nano-12B-v2

Use with VLLM

Download models at first by using hf

   hf download nicklas373/NVIDIA-Nemotron-Nano-12B-v2-AWQ

Copy hash for snapshots directory, then use it for chat templates and tool call parser ex: /home/xxx/.cache/huggingface/hub/models--nicklas373--NVIDIA-Nemotron-Nano-12B-v2-AWQ/snapshots/HASH_CODE/xxx

Run models with this command

vllm serve nicklas373/NVIDIA-Nemotron-Nano-12B-v2-AWQ \
             --chat-template '/home/xxx/.cache/huggingface/hub/models--nicklas373--NVIDIA-Nemotron-Nano-12B-v2-AWQ/snapshots/HASH_CODE/chat_template.jinja' \
             --chat-template-content-format string \
             --disable-fastapi-docs \
             --dtype auto \
             --enable-auto-tool-choice \
             --mamba_ssm_cache_dtype float32 \
             --quantization compressed-tensors \
             --served-model-name NVIDIA-Nemotron-Nano-12B-v2-AWQ \
             --seed 0 \
             --tool-call-parser 'nemotron_json' \
             --tool-parser-plugin '/home/xxx/.cache/huggingface/hub/models--nicklas373--NVIDIA-Nemotron-Nano-12B-v2-AWQ/snapshots/HASH_CODE/nemotron_toolcall_parser_streaming.py' \
             --tokenizer 'nvidia/NVIDIA-Nemotron-Nano-12B-v2' \
             --trust-remote-code