39 224 217

Aymeric Roucher

m-ric

http://aymeric-roucher.github.io

AI & ML interests

MLE at Hugging Face 🤗 LLMs, Agents, RAG, Multimodal.

Articles

Organizations

m-ric's activity

upvoted an article 1 day ago

Article

wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR??

•

8 days ago

• 31

upvoted a paper 1 day ago

Were RNNs All We Needed?

Paper • 2410.01201 • Published 4 days ago • 23

upvoted an article 3 days ago

Article

RAG chatbot using llama3

•

Jul 7

• 73

upvoted a paper 4 days ago

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 8 days ago • 73

upvoted an article 4 days ago

Article

A Short Summary of Chinese AI Global Expansion

•

4 days ago

• 12

upvoted a paper 8 days ago

AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Paper • 2402.05602 • Published Feb 8 • 4

upvoted a paper 9 days ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 10 days ago • 92

upvoted 4 papers 11 days ago

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Paper • 2202.11214 • Published Feb 22, 2022 • 1

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

Paper • 2409.13373 • Published 16 days ago • 2

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

Paper • 2206.10498 • Published Jun 21, 2022 • 1

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published 15 days ago • 45

upvoted 2 articles 12 days ago

Article

Does Daily Software Engineering Work Need Reasoning Models?

•

12 days ago

• 5

Article

Exploring the Daily Papers Page on Hugging Face

13 days ago

• 25

upvoted a collection 17 days ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 17 days ago • 224

upvoted an article 17 days ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

18 days ago

• 144

upvoted an article 18 days ago

Article

Accelerate 1.0.0

23 days ago

• 34

upvoted 2 papers 19 days ago

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Paper • 2402.12875 • Published Feb 20 • 12

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published 24 days ago • 63

upvoted 2 articles 19 days ago

Article

Fine-tuning Parler TTS on a Specific Language

•

20 days ago

• 19

Article

Introducing Community Tools on HuggingChat

20 days ago

• 26

upvoted a paper 24 days ago

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Paper • 2409.04431 • Published 29 days ago • 1

upvoted an article 24 days ago

Article

Tool Use, Unified

Aug 12

• 54

upvoted 2 papers 25 days ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 78

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Paper • 2406.04520 • Published Jun 6 • 10

upvoted a paper 26 days ago

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

upvoted 2 papers about 1 month ago

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

Paper • 2408.14158 • Published Aug 26 • 2

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

upvoted an article about 1 month ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

upvoted 3 papers about 1 month ago

Human Feedback is not Gold Standard

Paper • 2309.16349 • Published Sep 28, 2023 • 5

Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization

Paper • 2406.00507 • Published Jun 1 • 1

Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

Paper • 2408.02442 • Published Aug 5 • 18

upvoted an article about 1 month ago

Article

Scaling robotics datasets with video encoding

Aug 27

• 33

upvoted 5 papers about 1 month ago

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

Paper • 2406.05534 • Published Jun 8 • 3

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 115

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Paper • 2401.04398 • Published Jan 9 • 20

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 111

upvoted an article about 1 month ago

Article

Serverless Inference with Hugging Face and NVIDIA NIMs

Jul 29

• 26

upvoted a collection about 1 month ago

Jamba-1.5

Collection

The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Aug 22 • 75

upvoted an article about 1 month ago

Article

The 5 Most Under-Rated Tools on Hugging Face

Aug 22

• 81

upvoted 2 papers about 2 months ago

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6 • 33

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

upvoted 2 articles about 2 months ago

Article

Introduction to ggml

Aug 13

• 100

Article

XetHub is joining Hugging Face!

Aug 8

• 78

upvoted 4 papers 2 months ago

Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

Paper • 2407.10718 • Published Jul 15 • 17

SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain

Paper • 2407.19584 • Published Jul 28 • 60

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 103

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Paper • 2312.15234 • Published Dec 23, 2023 • 3

upvoted an article 2 months ago

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Jul 23

• 198

upvoted a paper 3 months ago

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

Paper • 2303.10130 • Published Mar 17, 2023 • 3

upvoted an article 3 months ago

Article

The Rise of Agentic Data Generation

•

Jul 15

• 74

upvoted 4 papers 3 months ago

RouteLLM: Learning to Route LLMs with Preference Data

Paper • 2406.18665 • Published Jun 26 • 5

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 4

BERT Rediscovers the Classical NLP Pipeline

Paper • 1905.05950 • Published May 15, 2019 • 2

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 33

upvoted 3 articles 3 months ago

Article

Banque des Territoires (CDC Group) x Polyconseil x Hugging Face: Enhancing a Major French Environmental Program with a Sovereign Data Solution

Jul 9

• 4

Article

Google Cloud TPUs made available to Hugging Face users

Jul 9

• 19

Article

Announcing New Dataset Search Features

Jul 8

• 22

upvoted 2 papers 3 months ago

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Paper • 2406.18518 • Published Jun 26 • 23

Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1 • 42

Aymeric Roucher

AI & ML interests

Articles

Our Transformers Code Agent beats the GAIA benchmark!

Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖

License to Call: Introducing Transformers Agents 2.0

Open-source LLMs as LangChain Agents

Organizations

m-ric's activity

wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR??

RAG chatbot using llama3

A Short Summary of Chinese AI Global Expansion

Does Daily Software Engineering Work Need Reasoning Models?

Exploring the Daily Papers Page on Hugging Face

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Accelerate 1.0.0

Fine-tuning Parler TTS on a Specific Language

Introducing Community Tools on HuggingChat

Tool Use, Unified

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Scaling robotics datasets with video encoding

Serverless Inference with Hugging Face and NVIDIA NIMs

The 5 Most Under-Rated Tools on Hugging Face

Introduction to ggml

XetHub is joining Hugging Face!

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

The Rise of Agentic Data Generation

Banque des Territoires (CDC Group) x Polyconseil x Hugging Face: Enhancing a Major French Environmental Program with a Sovereign Data Solution

Google Cloud TPUs made available to Hugging Face users

Announcing New Dataset Search Features