Kimi K2.5 System Prompt Leaked on GitHub (5k tokens)
Kimi K2.5's system prompt has been leaked on GitHub, revealing approximately 5,000 tokens of instructions that guide the AI model's behavior and responses.
Kimi K2.5 System Prompt Leaked on GitHub (5k tokens)
A 5,000-token system prompt for Moonshot AI’s Kimi K2.5 model surfaced on GitHub, revealing detailed instructions that shape how this Chinese AI assistant handles everything from creative writing to code generation.
The Discovery
The leaked prompt appeared in a public repository at https://github.com/moonshot-ai/kimi-system-prompts, exposing the architectural decisions behind one of China’s most capable language models. Unlike typical system prompts that run 200-500 tokens, this extensive configuration demonstrates how developers use verbose instructions to control model behavior across diverse scenarios.
The document contains explicit guidelines for maintaining character consistency in fiction, formatting mathematical equations, and handling multilingual queries. Moonshot AI’s engineers embedded specific rules about when to switch between Chinese and English, how to structure code blocks, and protocols for declining inappropriate requests. This level of detail reflects a broader industry trend where companies invest significant effort in prompt engineering to differentiate their AI products.
Structural Components
The leaked prompt divides into several distinct modules, each governing specific capabilities. A 1,200-token section addresses creative writing, instructing the model to maintain narrative voice, track character development across long conversations, and avoid common plot inconsistencies. Another segment dedicates 800 tokens to mathematical reasoning, specifying LaTeX formatting requirements and step-by-step problem-solving approaches.
Code generation receives particular attention with guidelines spanning multiple programming languages. The prompt explicitly tells Kimi to include error handling, add inline comments for complex logic, and suggest testing strategies. For Python specifically, it mandates type hints and adherence to PEP 8 standards:
def calculate_metrics(data: list[dict]) -> dict[str, float]:
"""Calculate summary statistics from structured data.
Args:
data: List of dictionaries containing numeric values
Returns:
Dictionary with mean, median, and standard deviation
"""
# Implementation with proper error handling
if not data:
raise ValueError("Input data cannot be empty")
Safety instructions occupy roughly 1,000 tokens, establishing boundaries around political topics, personal data, and harmful content. The prompt uses a tiered approach, distinguishing between hard refusals for illegal requests and softer redirections for sensitive subjects.
Implementation Patterns
Moonshot’s engineers employed several sophisticated techniques within the prompt structure. Conditional logic appears throughout, with instructions like “if the user asks in Chinese about technical topics, provide terminology in both languages.” This bilingual handling addresses a specific need in the Chinese market where developers frequently mix languages.
The prompt also includes meta-instructions about conversation memory. Kimi receives explicit directions to reference earlier exchanges, synthesize information across multiple turns, and acknowledge when context from previous messages influences current responses. These instructions explain why Kimi performs well in extended dialogues compared to models with simpler system configurations.
Format specifications consume significant token budget. The prompt details exact markdown syntax for headers, lists, tables, and code blocks. It specifies when to use bold versus italic emphasis, how to structure multi-step tutorials, and requirements for citing sources with URLs when discussing technical documentation.
Practical Implications
This leak offers valuable insights for developers building AI applications. The token allocation reveals priorities: Moonshot spent more tokens on output formatting and safety than on personality traits or conversational style. This suggests that consistent, well-structured responses matter more than creating a distinct character voice.
The modular organization provides a template for complex system prompts. Rather than one continuous block of text, Moonshot separated concerns into discrete sections that the model can reference independently. Developers working with GPT-4, Claude, or other models can adopt this pattern by grouping related instructions and using clear section markers.
However, the 5,000-token length raises questions about efficiency. Each token in the system prompt counts against context windows and increases processing costs. Teams should evaluate whether such extensive instructions actually improve performance or if more concise prompts achieve similar results. Testing different prompt lengths with consistent evaluation metrics helps identify the optimal balance.
The leak also highlights risks in prompt-based control systems. Competitors now understand Kimi’s behavioral constraints and can potentially craft inputs that exploit gaps in the instructions. Organizations relying on system prompts for safety should implement additional guardrails at the application layer rather than depending solely on prompt engineering.
For researchers studying AI alignment, this document demonstrates how commercial developers translate abstract safety principles into concrete operational rules. The specific phrasing choices and edge case handling reveal practical challenges that academic papers often overlook.
Related Tips
AI Giants Unite to Combat Chinese Model Theft
Major AI companies form alliance to prevent Chinese firms from illegally copying and redistributing their advanced language models and proprietary technology.
AI Models as RPG Characters: A New Framework
A framework reimagining AI language models as RPG characters with distinct stats, abilities, and classes to better understand their capabilities and
Auto-Rename Images with AI Vision & Live Reasoning
An AI-powered tool that automatically renames image files using computer vision and real-time reasoning to generate descriptive, meaningful filenames.