to_read - a hitchhiker3010 Collection

hitchhiker3010 's Collections

to_read

to_read

updated about 21 hours ago

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5 • 34
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 61
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 178
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 36
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20 • 17
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 36
Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Paper • 2406.15334 • Published Jun 21 • 8
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Paper • 2407.07071 • Published Jul 9 • 11
Transformer Layers as Painters

Paper • 2407.09298 • Published Jul 12 • 13
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5 • 3
Visual Text Generation in the Wild

Paper • 2407.14138 • Published Jul 19 • 8
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17 • 19
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 115
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 111
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 138
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

Paper • 2307.03692 • Published Jul 5, 2023 • 24
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 26
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published 2 days ago • 24
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Paper • 2410.02762 • Published 2 days ago • 7
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published 3 days ago • 12
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

Paper • 2410.01731 • Published 4 days ago • 11