Event box

Multimodal AI - Video and temporal understanding

Vision-language models can process video and image series. Additionally, they are trained to ground their response in time. This allows them to indicate when a particular event or shift occurs in the film. This session will explore the use of vision-language models for analyzing and interpreting moving images.

Image: Elise Racine & Digit / Woven Dialogues / Licensed by CC-BY 4.0

Date:
Wednesday, April 1, 2026
Time:
3:00pm - 4:00pm
Location:
Commons Library Classroom (D112)
Campus:
Commons Library
Audience:
  Princeton Student  
Categories:
  Data & Computation  

Registration is required. There are 23 in-person seats available. There are 5 online seats available.

To request disability-related accommodations for this event, please contact pulcomm@princeton.edu at least 3 working days in advance.