๐ง Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community โข 11 items โข Updated about 19 hours ago โข 49
Qwen2-Audio Collection Audio-language model series based on Qwen2 โข 4 items โข Updated Nov 28, 2024 โข 51
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers โข 67 items โข Updated Jul 3, 2024 โข 103
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. โข 14 items โข Updated May 21, 2024 โข 11
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper โข 2402.17485 โข Published Feb 27, 2024 โข 191
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. โข 19 items โข Updated Apr 12, 2024 โข 68