Training Data and inference scripts with tool calling , websearch and so on plus training scripts

#16

by snapo - opened about 15 hours ago

about 15 hours ago

Will you also publicly release the
Training Data -
Inference scripts you use (tool call and websearch) -
scripts you used for training the model -

Also additional questions:

Did you ever consider training it natively on NVFP4 or MXFP4 instead of fp16?
I run a benchmark and let the model have fp32 weights for the KV cache and fp16 for the weights, the benchmarks did slightly improve about 2% points
Are you planning to train a MoE on this model as it performs extremely well? Something like 40B4A

This is a very very good release thank you a lot, it works very well with ShinkaEvolve to improve algorithms.

To get the overthinking down it might be possible to punish the model if the solution already appeared more than 2 times in the thinking process (just from what i think might help)

Mustina

about 13 hours ago

Thanks for sharing this discussion! 👋

I’ve been exploring Nanbeige4.1-3B as well, and this thread has been really helpful. Just wanted to add that I’ve noticed improvements in generation quality with longer context inputs compared to previous versions. I’m curious if others have tested performance on domain-specific prompts (e.g., technical or scientific text), and how well the model maintains coherence over extended responses.

Would love to hear about other users’ experiences or tips for optimizing prompts! 😊

leran1995

Nanbeige LLM Lab org about 13 hours ago

Thanks a lot for the thoughtful feedback and support.

Low-precision training, MoE scaling, and efficient thinking without sacrificing performance are all part of our ongoing research.

Stay tuned for our future releases.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment