samusenps's picture

samusenps

samusenps

·

AI & ML interests

Foundational Architectures, Multi-Modality, Interpretability, Benchmarking w/ simulations, Robotics, Integration with Non envasive Open Source stack RISC-V BCI. Extremely high quality training data. Fully Open Source ML/AI.

Organizations

samusenps's activity

upvoted 28 papers 5 days ago

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Paper • 2409.16925 • Published 10 days ago • 6

NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models

Paper • 2409.16493 • Published 11 days ago • 7

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Paper • 2409.16299 • Published 26 days ago • 9

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing

Paper • 2409.16629 • Published 11 days ago • 9

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

Paper • 2409.17058 • Published 10 days ago • 9

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Paper • 2409.17145 • Published 10 days ago • 11

AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark

Paper • 2409.15041 • Published 12 days ago • 12

Boosting Healthcare LLMs Through Retrieved Context

Paper • 2409.15127 • Published 12 days ago • 18

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published 10 days ago • 58

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 10 days ago • 92

The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends

Paper • 2409.14195 • Published 14 days ago • 10

Pixel-Space Post-Training of Latent Diffusion Models

Paper • 2409.17565 • Published 10 days ago • 18

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published 10 days ago • 22

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published 9 days ago • 23

Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published 14 days ago • 25

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published 9 days ago • 34

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Paper • 2409.18125 • Published 9 days ago • 32

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published 10 days ago • 43

LML: Language Model Learning a Dataset for Data-Augmented Prediction

Paper • 2409.18957 • Published 8 days ago • 8

MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making

Paper • 2409.16686 • Published 11 days ago • 7

A Survey on the Honesty of Large Language Models

Paper • 2409.18786 • Published 8 days ago • 28

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows

Paper • 2409.17433 • Published 10 days ago • 8

MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published 8 days ago • 22

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Paper • 2409.18964 • Published 8 days ago • 20

Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult

Paper • 2409.17545 • Published 10 days ago • 16

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published 10 days ago • 22

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published 10 days ago • 45

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 8 days ago • 73

upvoted 7 papers 17 days ago

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

Paper • 2409.12139 • Published 17 days ago • 11

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published 18 days ago • 30

GRIN: GRadient-INformed MoE

Paper • 2409.12136 • Published 17 days ago • 14

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 17 days ago • 121

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published 17 days ago • 35

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Paper • 2409.11564 • Published 18 days ago • 18

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 17 days ago • 69

upvoted 10 papers 21 days ago

Can OOD Object Detectors Learn from Foundation Models?

Paper • 2409.05162 • Published 27 days ago • 6

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Paper • 2409.07239 • Published 25 days ago • 11

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Paper • 2409.08270 • Published 23 days ago • 9

DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors

Paper • 2409.08278 • Published 23 days ago • 10

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

Paper • 2409.08248 • Published 23 days ago • 12

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published 23 days ago • 15

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

Paper • 2409.08240 • Published 23 days ago • 15

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 30 days ago • 41

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 23 days ago • 42

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published 24 days ago • 63

upvoted 4 papers 24 days ago

LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation

Paper • 2409.06703 • Published 25 days ago • 2

INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding

Paper • 2409.06210 • Published 26 days ago • 24

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 25 days ago • 54

GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering

Paper • 2409.06595 • Published 25 days ago • 37

upvoted 11 papers 25 days ago

Evaluating Multiview Object Consistency in Humans and Image Models

Paper • 2409.05862 • Published 26 days ago • 8

Insights from Benchmarking Frontier Language Models on Web App Code Generation

Paper • 2409.05177 • Published 27 days ago • 5

Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak

Paper • 2409.04269 • Published 29 days ago • 9

UniDet3D: Multi-dataset Indoor 3D Object Detection

Paper • 2409.04234 • Published 30 days ago • 7

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Paper • 2409.05865 • Published 26 days ago • 14

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published 28 days ago • 22

Benchmarking Chinese Knowledge Rectification in Large Language Models

Paper • 2409.05806 • Published 26 days ago • 14

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published 29 days ago • 20

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Paper • 2409.05591 • Published 27 days ago • 26

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Paper • 2409.05152 • Published 27 days ago • 29

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published 26 days ago • 45