taesiri (taesiri)

upvoted a paper about 8 hours ago

Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published 3 days ago • 22

upvoted a collection 1 day ago

Emu3

Collection

3 items • Updated 9 days ago • 47

upvoted a paper 2 days ago

Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published 2 days ago • 24

upvoted a collection 2 days ago

LLaVA-Critic

Collection

as a general evaluator for assessing model performance • 6 items • Updated 1 day ago • 5

upvoted 5 papers 2 days ago

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published 2 days ago • 25

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published 2 days ago • 31

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published 2 days ago • 46

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Paper • 2410.01044 • Published 4 days ago • 34

The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published 5 days ago • 4

upvoted 4 papers 3 days ago

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

Paper • 2410.01731 • Published 3 days ago • 11

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published 3 days ago • 21

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published 4 days ago • 28

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published 6 days ago • 48

upvoted 3 papers 4 days ago

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

Paper • 2409.18943 • Published 8 days ago • 26

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published 5 days ago • 27

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published 5 days ago • 43

upvoted 2 papers 5 days ago

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Paper • 2409.18964 • Published 8 days ago • 20

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 8 days ago • 73

upvoted a collection 6 days ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 10 days ago • 218

upvoted a paper 6 days ago

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published 10 days ago • 45

upvoted 5 papers 9 days ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 10 days ago • 92

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Paper • 2409.16925 • Published 10 days ago • 6

upvoted a collection 10 days ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 10 days ago • 327

upvoted a paper 10 days ago

OmniBench: Towards The Future of Universal Omni-Language Models

Paper • 2409.15272 • Published 12 days ago • 24

upvoted 4 papers 12 days ago

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published 13 days ago • 27

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

Paper • 2409.14674 • Published 13 days ago • 40

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

Paper • 2409.15277 • Published 12 days ago • 34

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published 16 days ago • 66

upvoted 3 papers 16 days ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 16 days ago • 128

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published 16 days ago • 35

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published 17 days ago • 46

upvoted 5 papers 17 days ago

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published 17 days ago • 35

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 17 days ago • 121

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published 18 days ago • 30

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 17 days ago • 69

V-STaR: Training Verifiers for Self-Taught Reasoners

Paper • 2402.06457 • Published Feb 9 • 8

upvoted an article 17 days ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

18 days ago

• 144

upvoted 3 papers 18 days ago

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published 18 days ago • 66

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published 18 days ago • 81

Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types

Paper • 2409.09269 • Published 22 days ago • 7

upvoted 3 papers 19 days ago

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published 23 days ago • 10

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published 22 days ago • 30

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17 • 17

upvoted a paper 20 days ago

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published 24 days ago • 63

upvoted 2 collections 22 days ago

VideoGameBunny

Collection

7 items • Updated Sep 2 • 1

VLMs are Blind!

Collection

4 items • Updated Aug 3 • 1

upvoted 3 papers 22 days ago

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

Paper • 2409.08240 • Published 23 days ago • 15

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 30 days ago • 41

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 23 days ago • 42

upvoted 6 papers 24 days ago

MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis

Paper • 2409.07129 • Published 25 days ago • 6

Self-Harmonized Chain of Thought

Paper • 2409.04057 • Published 30 days ago • 16

MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications

Paper • 2409.07314 • Published 24 days ago • 50

Agent Workflow Memory

Paper • 2409.07429 • Published 24 days ago • 27

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published 25 days ago • 19

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 25 days ago • 54

upvoted 2 papers 25 days ago

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published 28 days ago • 22

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published 29 days ago • 20

taesiri PRO

AI & ML interests

Organizations

taesiri's activity

Fine-tuning LLMs to 1.58bit: extreme quantization made easy