AgentHandover: AI Skill Builder from Screen Activity
AgentHandover is an AI skill builder that learns from screen activity to automate repetitive tasks, enabling users to train intelligent agents by demonstrating
What It Is
AgentHandover is a Mac menu bar application that observes on-screen activity and automatically generates reusable skill definitions for AI agents. The tool runs Gemma 4 vision model locally through Ollama to watch workflows, then converts those observations into structured files that any agent can execute. Instead of manually documenting processes or writing instructions for agents, the system learns by watching.
The application offers two capture modes. Focus Record lets users explicitly mark when they want a workflow documented, useful for one-time demonstrations of specific tasks. Passive Discovery runs continuously in the background, identifying repeated patterns across multiple sessions and building skill definitions from recurring behaviors. As the system observes the same workflow multiple times, it refines the captured steps, adds guardrails, and adjusts confidence scores.
The processing happens through an 11-stage pipeline that runs entirely on-device. Screen captures never leave the local machine, and all data remains encrypted at rest. Generated skills integrate with agents through the Model Context Protocol (MCP), making them immediately available to tools like Claude Code, Cursor, and other MCP-compatible systems. A command-line interface provides an alternative to the menu bar for terminal-focused workflows.
Why It Matters
Most agent frameworks require explicit instruction sets or API documentation before they can perform tasks. This creates a documentation burden where developers spend time writing guides for processes they already know how to execute. AgentHandover inverts this model by treating demonstration as documentation.
The privacy-first architecture addresses a significant concern with screen recording tools. Running vision models locally through Ollama means sensitive workflows involving credentials, proprietary data, or confidential information never transmit to external services. For teams working with regulated data or internal tools, this local-first approach makes workflow automation feasible where cloud-based alternatives would fail compliance requirements.
The iterative refinement mechanism creates a feedback loop between human execution and agent capability. Early skill captures might miss edge cases or optional steps, but repeated observations let the system identify variations and update definitions accordingly. This gradual improvement means skills become more robust without manual editing.
MCP integration provides immediate practical value. Rather than building custom integrations for each agent framework, the protocol creates a standard interface. Any tool that implements MCP can access the skill library without additional configuration, reducing the friction between skill capture and agent execution.
Getting Started
Install Ollama from https://ollama.ai and pull the Gemma 4 model:
Clone the AgentHandover repository and follow the build instructions:
The repository includes setup documentation for running the menu bar app. Once launched, the application sits in the Mac menu bar with options to start Focus Record for immediate capture or enable Passive Discovery for background observation.
For MCP integration, configure the agent tool to point at the AgentHandover skill directory. The exact configuration varies by agent framework, but most MCP-compatible tools accept a file path or directory reference in their settings.
Skills export as structured files containing step sequences, expected outcomes, and confidence metadata. These files can be version controlled, shared across teams, or manually edited to add context the vision model might miss.
Context
Traditional workflow automation tools like Keyboard Maestro or AppleScript require users to explicitly program each action. AgentHandover sits between manual scripting and fully autonomous agents, capturing intent without requiring programming knowledge but producing structured output rather than just screen recordings.
Vision-based screen understanding remains challenging for local models. Gemma 4 provides reasonable accuracy for UI element detection and action recognition, but complex interfaces or rapid context switching may produce incomplete captures. The iterative refinement helps, but initial skill definitions often need validation before production use.
Alternative approaches include cloud-based screen understanding services like Adept or browser-specific automation tools like Playwright. Cloud services typically offer better accuracy through larger models but sacrifice privacy. Browser automation provides more reliable element targeting but only works within web contexts.
The 11-stage pipeline represents a tradeoff between processing time and accuracy. Running entirely on-device means slower processing compared to cloud alternatives, particularly on older hardware. Users working with time-sensitive workflows may find the processing delay disruptive during active recording sessions.
The Apache 2.0 license at https://github.com/sandroandric/AgentHandover allows commercial use and modification, making it viable for teams wanting to customize the pipeline or integrate with proprietary agent systems.
Related Tips
Codesight: AI-Ready Codebase Structure Generator
Codesight is an AI-ready codebase structure generator that creates organized, well-documented project architectures optimized for AI code assistants and
AI-Powered App Store Connect Submission Tool
An AI-powered tool that streamlines and automates the App Store Connect submission process, helping developers efficiently prepare, validate, and submit iOS
Codesight: AI-Optimized Codebase Documentation Tool
Codesight is an AI-powered documentation tool that automatically analyzes and generates comprehensive technical documentation for codebases, helping