1207 58 53

Quentin Gallouédec

qgallouedec

https://gallouedec.com

AI & ML interests

None yet

Recent Activity

liked a dataset about 6 hours ago

open-r1/OpenR1-Math-220k

upvoted an article about 23 hours ago

Open R1: Update #2

updated a model 3 days ago

qgallouedec/Qwen2.5-1.5B-Open-R1-GRPO

View all activity

Organizations

qgallouedec's activity

upvoted an article about 23 hours ago

Article

Open R1: Update #2

and 6 others •

1 day ago

• 111

upvoted a collection 5 days ago

DeepSeek-R1

Collection

8 items • Updated 22 days ago • 472

upvoted an article 9 days ago

Article

Open-R1: Update #1

and 7 others •

10 days ago

• 268

upvoted an article 11 days ago

Article

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

•

11 days ago

• 33

upvoted a paper 19 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 20 days ago • 314

upvoted a paper about 1 month ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 90

upvoted 2 papers about 2 months ago

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 11

Solving math word problems with process- and outcome-based feedback

Paper • 2211.14275 • Published Nov 25, 2022 • 8

upvoted a collection 2 months ago

Tiny models

Collection

23 items • Updated Nov 30, 2024 • 1

upvoted a paper 3 months ago

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 48

upvoted an article 4 months ago

Article

Finetuning PaliGemma with AutoTrain

•

Jul 25, 2024

• 9

upvoted 2 papers 4 months ago

The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published Sep 30, 2024 • 5

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Paper • 2401.08417 • Published Jan 16, 2024 • 35

upvoted a collection 4 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Dec 13, 2024 • 145

upvoted 3 papers 5 months ago

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Paper • 2405.21046 • Published May 31, 2024 • 4

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64

Binary Classifier Optimization for Large Language Model Alignment

Paper • 2404.04656 • Published Apr 6, 2024 • 2

upvoted 2 papers 6 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22, 2024 • 13

upvoted an article 6 months ago

Article

The 5 Most Under-Rated Tools on Hugging Face

Aug 22, 2024

• 86