Yongxin-Guo
/

trace

video temporal grounding

dense video caption

video highlight detection

Model card Files Files and versions Community

Edit model card

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

If our project helps you, please give us a star ⭐ on GitHub and cite our paper!

📰 News

[2024.10.10] 🔥 Our code and paper are released!
[2024.10.10] 🔥 Our checkpoints are available now!

Overview

In this work

We model the videos by a series of events, and propose causal event modeling framework to capture videos' inherent structure.
We present a novel task-interleaved video LLM model, TRACE, tailored to implement the causal event modeling framework through the sequential encoding/decoding of timestamps, salient scores, and textual captions.

Model Zoo

Checkpoints	Description	URL
Initialization	Weights initialized from VideoLLaMA2	trace-init
Stage-1	Model checkpoints trained after stage-1	trace-stage1
Stage-2	Model checkpoints trained after stage-2	trace
FT-Charades	Fine-tuned on Charades-STA dataset	trace-ft-charades
FT-Youcook2	Fine-tuned on Youcook2 dataset	trace-ft-youcook2
FT-QVHighlights	Fine-tuned on QVHighlights dataset	trace-ft-qvhighlights

Results

Youcook2 (Zero-Shot)	CIDER	METEOR	SODA_c	F1
TRACE	8.1	2.8	2.2	22.4

Charades-STA (Zero-Shot)	0.3	0.5	0.7	mIOU
TRACE	58.6	40.3	19.4	38.7

QVHighlights (Zero-Shot)	mAP	Hit@1
TRACE	26.8	42.7

ActivityNet-DVC	CIDER	METEOR	SODA_c	F1
TRACE	25.9	6.0	6.4	39.3

ActivityNet-MR	0.3	0.5	0.7	mIOU
TRACE	53.0	37.7	24.0	39.0

Downloads last month: 2

Safetensors

Model size

7.55B params

Tensor type

BF16

·

Inference API

Unable to determine this model's library. Check the docs .

Model tree for Yongxin-Guo/trace

Base model

mistralai/Mistral-7B-Instruct-v0.2

Finetuned

(350)

this model

Collection including Yongxin-Guo/trace

TRACE

TRACE: Temporal Grounding Video LLM via Casual Event Modeling • 8 items • Updated about 9 hours ago • 1