coding

Music Maker Uses JSON Output Over UI Automation

A developer created a music generation tool where Claude outputs songs as structured JSON data instead of using complex UI automation to interact with

Music Maker Using JSON Instead of Computer-Use Mode

What It Is

A developer recently built a music generation tool that sidesteps the complexity of UI automation by having Claude output songs as structured JSON data. Instead of instructing an AI model to click buttons, drag sliders, and navigate a graphical interface, the system asks Claude to generate music notation in a simple format:

{
 "tempo": 120,
 "beats": [1, 0, 1, 0, 1, 1, 0, 0],
 "melody": ["C4", "E4", "G4", "C5"]
}

The tool then parses this JSON and renders it as playable audio. This approach treats musical composition as a data structure problem rather than an interface automation challenge. Claude Code (Opus 4.6) generates the JSON, and a separate rendering layer handles playback without requiring the model to interact with any visual elements.

Why It Matters

This project highlights a fundamental tension in AI tool design: when should developers use sophisticated automation versus simple data interchange formats? Computer-use mode represents an impressive technical achievement, allowing models to control applications through simulated mouse movements and keyboard input. However, this capability introduces fragility. Screen layouts change, buttons move, and timing issues create unpredictable failures.

JSON output offers reliability that UI automation struggles to match. Language models excel at generating structured text, having been trained on massive amounts of code and data formats. When a model outputs JSON, developers get predictable, parseable results that integrate cleanly with existing software pipelines.

For music applications specifically, this matters because composition involves discrete, well-defined parameters. Tempo, note sequences, rhythms, and instrument choices all map naturally to key-value pairs. Teams building creative tools can focus on the rendering and playback logic rather than debugging why the AI occasionally clicks the wrong button.

The broader implication extends beyond music. Many domains that seem to require visual interaction actually work better with structured data exchange. Configuration management, data visualization, and workflow automation often benefit more from JSON schemas than from simulated user actions.

Getting Started

The music maker is available at https://lumpy-judicious-ocelot.instavm.site/ for hands-on experimentation. Developers interested in building similar tools can follow this basic pattern:

First, define a clear JSON schema for the domain. For music, this might include:

{
 "tempo": 120,
 "timeSignature": "4/4",
 "tracks": [
 {
 "instrument": "piano",
 "notes": [
 {"pitch": "C4", "duration": 0.5, "time": 0},
 {"pitch": "E4", "duration": 0.5, "time": 0.5}
 ]
 }
 ]
}

Next, craft prompts that instruct the model to output valid JSON matching this schema. Include examples in the prompt and specify constraints like valid note ranges or tempo limits.

Finally, build a parser and renderer that converts the JSON into the desired output format. For music, libraries like Tone.js or Web Audio API handle playback. For other domains, the rendering layer might generate visualizations, configuration files, or API calls.

Context

This approach contrasts sharply with computer-use implementations that attempt to control existing music software like GarageBand or Ableton. While those tools offer sophisticated features, automating them introduces unnecessary complexity. The model must understand spatial layouts, handle timing delays, and recover from UI changes.

The JSON method has limitations. It works best for domains with clear data models and where custom rendering is acceptable. Applications requiring pixel-perfect control of existing software, or those where the UI itself provides essential feedback, may still need computer-use capabilities.

Alternative approaches include MIDI file generation, which offers standardization across music software, or direct audio synthesis using models trained on waveforms. However, JSON strikes a practical balance between simplicity and expressiveness for many use cases.

The project draws inspiration from Google’s Song Maker, demonstrating that sometimes the most effective AI integration involves rethinking the interface entirely rather than automating the existing one. When models already handle a data format reliably, building around that strength often beats forcing them to navigate visual interfaces designed for humans.