view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 4 days ago • 43
Open Coding Agents Specialization Collection Ai2 Open Coding Agents - Django, Sphinx, Sympy Data • 6 items • Updated 6 days ago • 2
Multilingual PII & De-Identification Collection Multilingual models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 113 items • Updated 4 days ago • 17
view article Article Classement compar:IA : des votes des utilisateurs au classement participatif des modèles Nov 3, 2025 • 7
compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data Paper • 2602.06669 • Published 10 days ago • 7
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 13 days ago • 69
Instruction Pretrained Experiments Collection Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 17 days ago • 99
MMFineReason Collection High-quality STEM reasoning dataset for Multimodal LLM post-training. • 14 items • Updated 13 days ago • 20
Continually pre-trained models Collection Language-specific LLMs continually pre-trained from fully open English base models • 2 items • Updated 26 days ago • 1
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 16 items • Updated 18 days ago • 13