Willison: The killer app of Gemini Pro 1.5 is video

Simon Willison tries out Gemini Pro 1.5 on video, and suggests its 1M token context size opens up powerful new opportunities using video prompts

JACK IVERS ELSEWHERE 1 MIN READ

Willison’s experience with, and reaction to, Gemini 1.5 Pro extracting structured output from video prompts parallels my own experience using GPT-4 Vision to extract structure from heirloom recipe images (often handwritten and horribly mangled):

… I’m pretty astonished by this.

… I find those results pretty astounding.

The ability to analyze video like this feels SO powerful. Being able to take a 20 second video of a bookshelf and get back a JSON array of those books is just the first thing I thought to try.

    Share:
    Back to Archive

    Related Posts

    View All Posts »