general

AI Agent Counts Jensen's 121 "AI" Mentions at CES

A developer used an AI agent with Model Context Protocol servers to automatically count and extract all 121 instances of Jensen Huang saying "AI" during his

What It Is

A developer used an AI agent to automatically count and compile every instance of Jensen Huang saying “AI” during his CES 2025 keynote - all 121 times. The process involved chaining together multiple Model Context Protocol (MCP) servers through Dive, an open-source MCP client available at https://github.com/OpenAgentPlatform/Dive. The workflow downloaded the keynote from YouTube, extracted word-level timestamps from subtitles, cut 121 individual video clips at precise moments, and concatenated them into a single compilation video.

The technical implementation relied on two custom MCP servers: yt-dlp-mcp for downloading YouTube content with subtitle data, and ffmpeg-mcp-lite for video editing operations. A single natural language prompt orchestrated the entire pipeline, demonstrating how MCP servers can work together to automate complex multi-step tasks that would normally require manual scripting across different tools.

Why It Matters

This example showcases the practical power of Model Context Protocol beyond simple chatbot interactions. MCP servers act as standardized interfaces between AI agents and external tools, allowing developers to compose workflows without writing custom integration code for each combination of services.

The video compilation task illustrates several important capabilities. First, the agent handled format conversions and data parsing autonomously - extracting timestamps from JSON3 subtitle format and calculating precise clip boundaries with padding. Second, it managed state across multiple operations, tracking 121 separate video segments through the cutting and concatenation process. Third, it coordinated between two different MCP servers that had no direct knowledge of each other.

For developers building AI-powered automation, this pattern of chaining specialized MCP servers offers an alternative to monolithic tools or custom scripts. Teams can create focused MCP servers for specific capabilities (video processing, API access, file operations) and let AI agents orchestrate them based on natural language instructions.

Getting Started

To replicate this workflow, developers need Dive installed and two MCP servers configured. The yt-dlp-mcp server (https://github.com/kevinwatt/yt-dlp-mcp) wraps the popular yt-dlp tool for downloading videos with subtitle data. The ffmpeg-mcp-lite server (https://github.com/kevinwatt/ffmpeg-mcp-lite) provides video editing capabilities through ffmpeg.

The original prompt structure provides a template:

Task: Create a compilation video of every exact moment Jensen Huang says "AI".
Video source: https://www.youtube.com/watch?v=0NBILspM4c4

Download video in 720p + subtitles in JSON3 format (word-level timestamps)
Parse JSON3 to find every "AI" instance with precise start/end times Use ffmpeg to cut clips (~50-100ms padding for natural sound)
Concatenate all clips chronologically Output: Jensen_CES_AI.mp4

The agent interprets these instructions and calls appropriate MCP server functions in sequence. Developers can adapt this pattern for other compilation tasks - extracting specific phrases from podcasts, creating highlight reels based on keyword mentions, or analyzing speaking patterns across multiple videos.

Context

Traditional approaches to this task would require writing custom scripts that integrate yt-dlp’s command-line interface with ffmpeg operations, handling file paths, parsing subtitle formats, and managing temporary files. The MCP approach shifts complexity from integration code to natural language task description.

However, MCP-based workflows have limitations. Performance depends on the AI model’s ability to correctly interpret instructions and call functions in the right sequence. Complex video editing operations may still require traditional scripting for precise control. The approach works best for tasks that can be described clearly in natural language and don’t require real-time processing.

Alternative tools like Zapier or n8n offer visual workflow builders for automation, but they require pre-configured integrations and don’t handle novel combinations of tools as flexibly. Custom Python scripts with libraries like moviepy provide more control but require more upfront development time.

The Jensen Huang compilation demonstrates MCP’s sweet spot: automating multi-step technical tasks that are tedious to script manually but straightforward to describe. As more MCP servers emerge for different tools and services, this pattern of agent-orchestrated workflows becomes increasingly practical for everyday development tasks.