Activity Feed

AI & ML interests

Democratizar el PLN en español e incentivar su aplicación para generar impacto social 💛

Recent Activity

somosnlp's activity

davidberenstein1957 
posted an update 1 day ago
davidberenstein1957 
posted an update 5 days ago
davidberenstein1957 
posted an update 6 days ago
tadeodonegana 
posted an update 7 days ago
view post
Post
1121
At RooMix(dot)ai we’re looking for an expert in generative image models for a short consulting gig. Any recommendations?
  • 1 reply
·
davidberenstein1957 
posted an update 7 days ago
davidberenstein1957 
posted an update 12 days ago
view post
Post
1554
tldr; Parquet is awesome, DuckDB too!

Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDB’s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend
davidberenstein1957 
posted an update 15 days ago
davidberenstein1957 
posted an update 21 days ago
davidberenstein1957 
posted an update 25 days ago
nataliaElv 
posted an update 25 days ago
view post
Post
1449
New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub. 

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10
davidberenstein1957 
posted an update 28 days ago
nataliaElv 
posted an update about 1 month ago
davidberenstein1957 
posted an update about 1 month ago
davidberenstein1957 
posted an update about 1 month ago
davidberenstein1957 
posted an update about 2 months ago
nataliaElv 
posted an update about 2 months ago
view post
Post
1664
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU
davidberenstein1957 
posted an update about 2 months ago
view post
Post
4221
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator
  • 4 replies
·