Multimodal AI - Video and temporal understanding
Event box
Vision-language models can process video and image series. Additionally, they are trained to ground their response in time. This allows them to indicate when a particular event or shift occurs in the film. This session will explore the use of vision-language models for analyzing and interpreting moving images.
Image: Elise Racine & Digit / Woven Dialogues / Licensed by CC-BY 4.0
- Date:
- Wednesday, April 1, 2026
- Time:
- 3:00pm - 4:00pm
- Location:
- Commons Library Classroom (D112)
- Campus:
- Commons Library
- Audience:
- Princeton Student
- Categories:
- Data & Computation
To request disability-related accommodations for this event, please contact pulcomm@princeton.edu at least 3 working days in advance.