Shyam Sunder Kumar's picture

Shyam Sunder Kumar

theainerd

·

AI & ML interests

Natural Language Processing

Organizations

theainerd's activity

upvoted a collection about 1 month ago

🪐 SmolLM

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 174

upvoted a paper 2 months ago

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

upvoted an article 2 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29

• 212

upvoted a paper 2 months ago

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Paper • 2407.18961 • Published Jul 18 • 38

upvoted 3 collections 2 months ago

Research projects on top of vLLM

Papers cited in https://blog.vllm.ai/2024/07/25/lfai-perf.html • 6 items • Updated Jul 29 • 12

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 11 days ago • 587

Preference Datasets for DPO

This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Jul 30 • 28

upvoted 2 collections 3 months ago

NuminaMath

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 57

Qwen2

Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 18 days ago • 341

upvoted a paper 3 months ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 154

upvoted an article 3 months ago

Article

The Rise of Agentic Data Generation

By

•

Jul 15

• 74

upvoted a paper 3 months ago

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1 • 85

upvoted 2 collections 3 months ago

Gemma 2 Release

15 items • Updated 27 days ago • 177

OLMo Suite

Artifacts for the first set of OLMo models. • 18 items • Updated 11 days ago • 57

upvoted 2 collections 4 months ago

Nemotron 4 340B

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 5 days ago • 156

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated 18 days ago • 474

upvoted a paper 4 months ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42

upvoted an article 4 months ago

Article

Benchmarking Text Generation Inference

May 29

• 27

upvoted an article 5 months ago

Article

Hugging Face x LangChain : A new partner package in LangChain

May 14

• 105

upvoted a collection 5 months ago

[lecture artifacts] aligning open language models

artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17 • 56

upvoted 4 papers 5 months ago

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2 • 59

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 114

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

upvoted 2 papers 6 months ago

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

Paper • 2403.18421 • Published Mar 27 • 21

Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 23

upvoted 5 papers 7 months ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 47

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12 • 39

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 75

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93

upvoted a collection 7 months ago

💫 StarCoder2

StarCoder2 models and datasets! • 8 items • Updated Mar 1 • 80

upvoted a paper about 1 year ago

Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 40