Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

liked a model 4 days ago

HuggingFaceTB/SmolLM2-135M-Instruct

posted an update 8 days ago

ByteDance drops OmniHuman🔥 This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion. Project: https://omnihuman-lab.github.io/

upvoted a paper 8 days ago

Process Reinforcement through Implicit Rewards

View all activity

Organizations

Jaward's activity

liked a model 4 days ago

HuggingFaceTB/SmolLM2-135M-Instruct

Text Generation • Updated 6 days ago • 131k • 113

posted an update 8 days ago

Post

3224

ByteDance drops OmniHuman🔥
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/

3 replies

upvoted 2 papers 8 days ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published 8 days ago • 53

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published 9 days ago • 168

posted an update 11 days ago

Post

1459

The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300

1 reply

upvoted an article 12 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

15 days ago

• 706

liked a model 17 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 3 days ago • 2.94M • • 8.31k

liked a Space 19 days ago

518

DeepSeek-R1 WebGPU

🧠

Next-generation reasoning model that runs locally in-browser

upvoted a paper 23 days ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published 26 days ago • 105

reacted to mlabonne's post with 🧠 26 days ago

Post

4383

🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course

liked a model 26 days ago

unsloth/phi-4-GGUF

Text Generation • Updated 29 days ago • 75.6k • 141

posted an update 29 days ago

Post

1871

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

liked a model 30 days ago

deepseek-ai/DeepSeek-V3

Text Generation • Updated 19 days ago • 1.26M • • 3.36k

posted an update about 1 month ago

Post

1368

Huge AI win in medicine👏
"Large language of life model" just dropped!!
Full paper: https://www.nature.com/articles/s41586-024-08391-z

1 reply

upvoted a collection about 1 month ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated 26 days ago • 259

posted an update about 1 month ago

Post

2319

damn I love nvidia's bullish stance on taking AI to the edge - from being the overlord of compute to cutting-edge physical AI with SOTA multiverse simulation engines that brings the scaling laws under your control!!

My favorite: Cosmos - fully opensourced, open-weight physics based video gen platform, what an incredible way to start off the year✨

Code: https://github.com/NVIDIA/Cosmos
Models: nvidia/cosmos-6751e884dc10e013a0a0d8e6
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_2.pdf

liked a model about 1 month ago

Qwen/QVQ-72B-Preview

Image-Text-to-Text • Updated about 1 month ago • 157k • 541

posted an update about 2 months ago

Post

3036

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

liked a model about 2 months ago

deepseek-ai/DeepSeek-V3-Base

Updated 19 days ago • 29.8k • 1.55k

replied to their post about 2 months ago

btw the background songs in the videos are actually what I listen to during implementation