Julien BLANCHON's picture

Julien BLANCHON PRO

blanchon

·

AI & ML interests

Math

Organizations

blanchon's activity

New activity in enzostvs/lora-studio 4 months ago

add-animate-flip-everywhere

#14 opened 4 months ago by

New activity in codys12/MergeLlama 4 months ago

fix dataset

#1 opened about 1 year ago by

commented 58 papers 4 months ago

Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4 • 15 •

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Paper • 2406.02523 • Published Jun 4 • 9 •

V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

Paper • 2406.02511 • Published Jun 4 • 8 •

I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 15 •

Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3 • 18 •

To Believe or Not to Believe Your LLM

Paper • 2406.02543 • Published Jun 4 • 31 •

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4 • 29 •

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Paper • 2406.02886 • Published Jun 5 • 7 •

Item-Language Model for Conversational Recommendation

Paper • 2406.02844 • Published Jun 5 • 8 •

Searching Priors Makes Text-to-Video Synthesis Better

Paper • 2406.03215 • Published Jun 5 • 11 •

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Paper • 2406.02884 • Published Jun 5 • 14 •

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5 • 17 •

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Paper • 2406.01014 • Published Jun 3 • 30 •

Parrot: Multilingual Visual Instruction Tuning

Paper • 2406.02539 • Published Jun 4 • 35 •

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 36 •

Open-Endedness is Essential for Artificial Superhuman Intelligence

Paper • 2406.04268 • Published Jun 6 • 11 •

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6 • 17 •

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 22 •

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23 •

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6 • 27 •

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6 • 26 •

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published Jun 6 • 36 •

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71 •

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47 •

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Paper • 2305.10973 • Published May 18, 2023 • 31 •

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Paper • 2305.10601 • Published May 17, 2023 • 10 •

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 74 •

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 56 •

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Paper • 2402.16837 • Published Feb 26 • 24 •

CroissantLLM: A Truly Bilingual French-English Language Model

Paper • 2402.00786 • Published Feb 1 • 25 •

Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 40 •

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16 • 19 •

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 34 •

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published May 23 • 35 •

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23 • 33 •

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18 • 27 •

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23 • 39 •

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 33 •

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 35 •

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 28 •

LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65 •

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88 •

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Paper • 2403.18795 • Published Mar 27 • 17 •

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 29 •

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023 •

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29 • 26 •

MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 27 •

Exploration into Translation-Equivariant Image Quantization

Paper • 2112.00384 • Published Dec 1, 2021 •

Muse: Text-To-Image Generation via Masked Generative Transformers

Paper • 2301.00704 • Published Jan 2, 2023 •

Improving language models by retrieving from trillions of tokens

Paper • 2112.04426 • Published Dec 8, 2021 • 1 •

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16 • 26 •

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14 • 24 •

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1 • 14 •

EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation

Paper • 2205.14756 • Published May 29, 2022 •

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 35 •

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Paper • 2307.02421 • Published Jul 5, 2023 • 34 •

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19 • 29 •

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49 •