SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published 7 days ago β’ 153
Towards Best Practices for Open Datasets for LLM Training Paper β’ 2501.08365 β’ Published 28 days ago β’ 54
view post Post 4078 Everchanging Quest is out !It is an LLM controlled Rogue-Like in which the LLM gets a markdown representation of the map, and should generate a JSON with the objective to fulfill on the map as well as the necessary objects and their placements.Come test it on the space : Jofthomas/Everchanging-Quest 2 replies Β· π₯ 23 23 π 11 11 π 2 2 π§ 1 1 β€οΈ 1 1 π 1 1 π€― 1 1 π€ 1 1 + Reply
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper β’ 2406.17557 β’ Published Jun 25, 2024 β’ 91
A Dataset and Strong Baselines for Classification of Czech News Texts Paper β’ 2307.10666 β’ Published Jul 20, 2023