Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19 • 35
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition Paper • 2302.13750 • Published Feb 27, 2023 • 2
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 48
view article Article From PyTorch DDP to 🤗 Accelerate to 🤗 Trainer, mastery of distributed training with ease Oct 21, 2022 • 10
Tuna: Instruction Tuning using Feedback from Large Language Models Paper • 2310.13385 • Published Oct 20, 2023 • 10
Datasets: A Community Library for Natural Language Processing Paper • 2109.02846 • Published Sep 7, 2021 • 8
Estimating Knowledge in Large Language Models Without Generating a Single Token Paper • 2406.12673 • Published Jun 18 • 7
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models Paper • 2406.11289 • Published Jun 17 • 5
Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published Jun 27 • 24
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published Jun 28 • 21
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 18 days ago • 340
Probably DPO datasets Collection A collection of datasets that probably support DPO • 146 items • Updated Jun 26 • 12
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 43 items • Updated 8 days ago • 53
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 148
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • Jun 4 • 69
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 18 days ago • 206
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 26