voxtral-model-q8
This model is a quantized version of mistrals voxtral model https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602 and is to be used with the voxtral.c implementation originally created by antirez here https://github.com/antirez/voxtral.c
Currently int8 quantization is experimental and supported in this fork https://github.com/drbh/voxtral.c.
Usage
git clone https://github.com/drbh/voxtral.c
cd voxtral.c
make mps
./download_q8_model.sh
./voxtral -d voxtral-model-q8 -i samples/jfk.wav
# Loading weights...
# Metal GPU: 4061.2 MB
# Model loaded.
# Audio: 176000 samples (11.0 seconds)
# And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country.
# Encoder: 1496 mel -> 187 tokens (657 ms)
# Decoder: 26 text tokens (149 steps) in 3548 ms (prefill 436 ms + 21.0 ms/step)
or stream from your microphone:
./voxtral -d voxtral-model-q8 --from-mic
# Loading weights...
# Metal GPU: 4061.2 MB
# Model loaded.
# Listening (Ctrl+C to stop)...
# Check, check. Is the microphone streaming working? Okay, cool.^C
# Stopping...
# Encoder: 1562 mel -> 195 tokens (832 ms)
# Decoder: 14 text tokens (157 steps) in 3951 ms (prefill 487 ms + 22.2 ms/step)
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support