BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27 • 51
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 53
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29 • 34
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 59
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding Paper • 2403.01487 • Published Mar 3 • 15