view article Article Agentic RAG Stack (1/5) - Index and retrieve documents for vector search using Sentence Transformers and DuckDB By davidberenstein1957 • 15 days ago • 15
⛔️🔦 Provenance, Watermarking & Deepfake Detection Collection Technical tools for more control over non-consensual synthetic content • 14 items • Updated Apr 1, 2024 • 43
Synthetic Data Generator Collection A collection of tools and datasets related to no-code the Synthetic Data Generation. • 21 items • Updated 1 day ago • 7
view article Article Mastering Tensor Dimensions in Transformers By not-lain • about 1 month ago • 43
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated about 21 hours ago • 68
view article Article Let’s make a generation of amazing image generation models By burtenshaw and 4 others • Nov 26, 2024 • 34
view article Article How to optimize your data labelling project with custom interfaces By burtenshaw and 9 others • Oct 16, 2024 • 18
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30, 2024 • 37
view article Article ⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2 By burtenshaw • Jun 3, 2024 • 26
view article Article 🧑⚖️ "Replacing Judges with Juries" using distilabel By alvarobartt • May 3, 2024 • 17
view article Article ⚗️ 🧑🏼🌾 Let's grow some Domain Specific Datasets together By burtenshaw • Apr 29, 2024 • 29