Training Data and inference scripts with tool calling , websearch and so on plus training scripts
Will you also publicly release the
Training Data -
Inference scripts you use (tool call and websearch) -
scripts you used for training the model -
Also additional questions:
- Did you ever consider training it natively on NVFP4 or MXFP4 instead of fp16?
- I run a benchmark and let the model have fp32 weights for the KV cache and fp16 for the weights, the benchmarks did slightly improve about 2% points
- Are you planning to train a MoE on this model as it performs extremely well? Something like 40B4A
This is a very very good release thank you a lot, it works very well with ShinkaEvolve to improve algorithms.
To get the overthinking down it might be possible to punish the model if the solution already appeared more than 2 times in the thinking process (just from what i think might help)
Thanks for sharing this discussion! 👋
I’ve been exploring Nanbeige4.1-3B as well, and this thread has been really helpful. Just wanted to add that I’ve noticed improvements in generation quality with longer context inputs compared to previous versions. I’m curious if others have tested performance on domain-specific prompts (e.g., technical or scientific text), and how well the model maintains coherence over extended responses.
Would love to hear about other users’ experiences or tips for optimizing prompts! 😊
Thanks a lot for the thoughtful feedback and support.
Low-precision training, MoE scaling, and efficient thinking without sacrificing performance are all part of our ongoing research.
Stay tuned for our future releases.