view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett • 8 days ago • 31
AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers Paper • 2402.05602 • Published Feb 8 • 4
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 10 days ago • 92
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators Paper • 2202.11214 • Published Feb 22, 2022 • 1
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench Paper • 2409.13373 • Published 16 days ago • 2
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change Paper • 2206.10498 • Published Jun 21, 2022 • 1
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published 15 days ago • 45
view article Article Does Daily Software Engineering Work Need Reasoning Models? By onekq • 12 days ago • 5
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 17 days ago • 224
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Paper • 2402.12875 • Published Feb 20 • 12
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 24 days ago • 63
Theory, Analysis, and Best Practices for Sigmoid Self-Attention Paper • 2409.04431 • Published 29 days ago • 1
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 78
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published Jun 6 • 10
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning Paper • 2408.14158 • Published Aug 26 • 2
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25 • 18
Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization Paper • 2406.00507 • Published Jun 1 • 1
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 18
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing Paper • 2406.05534 • Published Jun 8 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Paper • 2401.04398 • Published Jan 9 • 20
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 111
Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Aug 22 • 75
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning Paper • 2407.10718 • Published Jul 15 • 17
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 60
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems Paper • 2312.15234 • Published Dec 23, 2023 • 3
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models Paper • 2303.10130 • Published Mar 17, 2023 • 3
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Paper • 2310.06770 • Published Oct 10, 2023 • 4
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33
view article Article Banque des Territoires (CDC Group) x Polyconseil x Hugging Face: Enhancing a Major French Environmental Program with a Sovereign Data Solution Jul 9 • 4
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26 • 23
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1 • 42