Tencent’s HY-Motion Generates 3D Animation from Text

While OpenAI’s Sora and Runway’s Gen-3 have dominated headlines for generating video from text prompts, Tencent’s newly released HY-Motion takes a fundamentally different approach by creating controllable 3D character animations rather than flat video footage. The distinction matters: instead of producing pixels that merely look like movement, HY-Motion generates actual skeletal motion data that animators can edit, retarget, and integrate into production pipelines.

Overview

HY-Motion represents Tencent’s entry into text-to-motion synthesis, a specialized domain within generative AI focused on creating character animations from natural language descriptions. The system accepts text prompts like “a person doing a backflip” or “character walking while waving” and outputs motion capture-style skeletal animation data compatible with standard 3D software packages.

The model builds on diffusion-based generation techniques similar to those powering image generators like Stable Diffusion, but adapted specifically for temporal motion sequences. Rather than generating RGB pixels, HY-Motion produces joint rotations and positions across time, creating animations that maintain physical plausibility and spatial coherence throughout multi-second sequences.

Tencent’s research team trained the system on extensive motion capture datasets, teaching it the biomechanical constraints and natural movement patterns that distinguish believable character animation from robotic or uncanny motion. The result handles both simple actions and complex multi-part movements, from basic locomotion to acrobatic sequences.

Technical Details

HY-Motion employs a transformer-based architecture that processes both the text embedding and temporal motion data simultaneously. The model treats motion as a sequence of pose tokens, where each token represents the skeletal configuration at a specific frame. This tokenization approach allows the system to apply attention mechanisms across both the spatial dimensions (different body parts) and temporal dimensions (movement over time).

The training process involved several technical innovations to address motion-specific challenges. Standard diffusion models struggle with temporal consistency, often producing animations where movements feel disconnected or physically impossible. HY-Motion incorporates kinematic constraints directly into the generation process, ensuring that joint angles remain within anatomically valid ranges and that momentum carries naturally between frames.

The system also implements a hierarchical generation strategy, first creating coarse motion trajectories before refining detailed joint movements. This two-stage approach helps maintain overall motion coherence while allowing fine-grained control over specific body parts. Developers can access the model through https://github.com/Tencent/HunyuanVideo, though the motion-specific components require additional dependencies for skeletal animation export.

A key technical achievement involves the model’s ability to handle motion retargeting automatically. The same generated animation can adapt to characters with different skeletal proportions, from realistic human models to stylized game characters, without manual adjustment. This flexibility stems from representing motion in a normalized joint-space rather than absolute world coordinates.

Practical Impact

For animation studios and game developers, HY-Motion addresses a persistent production bottleneck. Traditional motion capture requires specialized equipment, actor availability, and extensive cleanup work. While libraries of pre-captured animations exist, finding the exact movement needed often proves impossible, forcing animators to either compromise on their vision or invest significant time in manual keyframing.

Text-to-motion synthesis transforms this workflow by enabling rapid iteration on movement ideas. An animator can test dozens of variations on a combat sequence or dance routine in minutes rather than days, using natural language to explore creative possibilities before committing to final production. The generated motions serve as starting points that human animators refine, rather than replacing artistic input entirely.

Independent developers and smaller studios gain particular advantages, accessing motion variety previously available only to teams with dedicated motion capture facilities. A solo game developer can now populate their world with diverse NPC behaviors without purchasing expensive animation packs or learning complex animation software from scratch.

The technology also opens possibilities for real-time applications. Virtual production environments could generate background character animations on-demand, while interactive experiences might create unique movement responses based on user input or narrative context.

Outlook

HY-Motion arrives as part of a broader trend toward specialized generative models that produce structured data rather than just images or text. While general video generators capture attention with photorealistic output, motion synthesis tools offer production value that extends beyond initial generation, creating assets that integrate into existing workflows.

The next development phase will likely focus on multi-character interactions and environmental awareness, enabling generated animations that respond to scene geometry and other characters. Current systems generate motion in isolation, but future versions could understand spatial context from text descriptions like “character sits on the nearby chair” or “two people shake hands.”

As these models mature, the boundary between procedural animation and authored content will continue blurring, giving creators new tools for expressing movement ideas without requiring deep technical expertise in biomechanics or animation principles.

Tencent's HY-Motion: Text to 3D Animation Tool

Tencent’s HY-Motion Generates 3D Animation from Text

Overview

Technical Details

Practical Impact

Outlook

Related Tips

ACE-Step 1.5: ByteDance's Fast Music AI Generator

ACE-Step v1: Music Generation on 8GB VRAM

AGI-Llama: Modern AI for Classic Sierra Games