LLMs Develop Universal Internal Representation
Research reveals that large language models develop language-agnostic internal representations, where identical content in different languages produces more
LLMs Use Universal Internal Language Across Languages
What It Is
Research into transformer model internals has revealed something unexpected: large language models appear to develop a language-agnostic internal representation when processing information. When analyzing the middle layers of these models, identical content translated into different languages (such as Chinese and English) produces more similar activation patterns than completely different content written in the same language.
This discovery emerged from experiments with model expansion techniques. Rather than training larger models from scratch, researchers found they could repeat transformer blocks in the middle layers of existing models to create more capable variants. The RYS (Repeat Your Self) series demonstrates this approach, building on Qwen3.5-27B by duplicating middle-layer blocks at different scales. Four variants exist with increasing repetition counts: S (small), M (medium), L (large), and XL (extra-large), all quantized to FP8 for efficiency.
The technique works because middle layers handle semantic understanding rather than language-specific encoding or decoding. Early layers convert tokens into representations, final layers convert back to text, but middle layers operate on meaning itself - apparently in a format that transcends individual languages.
Why It Matters
This finding has immediate practical implications for multilingual AI development. If models truly process meaning in a universal format, training on high-quality data in one language should improve performance across all languages the model supports. Teams working with limited resources in low-resource languages could potentially achieve better results by focusing on semantic understanding rather than language-specific training.
The model expansion approach also offers a cost-effective path to larger models. Training a 27B parameter model from scratch requires enormous computational resources, but expanding an existing model through layer repetition provides a shortcut. The XL variant shows particular promise for fine-tuning applications, potentially reaching state-of-the-art performance in its size category after task-specific training.
For researchers studying model interpretability, the universal representation hypothesis provides a new lens for understanding how transformers work. Rather than viewing these models as sophisticated pattern matchers operating on text statistics, the evidence suggests they build abstract semantic representations that exist independently of surface-level language features.
Getting Started
All four RYS-Qwen3.5-27B variants are available on Hugging Face:
- https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S
- https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M
- https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L
- https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL
Loading these models follows standard Hugging Face patterns:
model = AutoModelForCausalLM.from_pretrained(
"dnhkng/RYS-Qwen3.5-27B-FP8-XL",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("dnhkng/RYS-Qwen3.5-27B-FP8-XL")
The complete technical writeup, including cross-language similarity analysis and methodology details, is available at https://dnhkng.github.io/posts/rys-ii/
Teams interested in fine-tuning should start with the XL variant, which shows the strongest response to task-specific training. The FP8 quantization keeps memory requirements manageable while preserving most of the model’s capabilities.
Context
Traditional model scaling follows a “bigger is better” philosophy - more parameters, more training data, more compute. The RYS approach challenges this by demonstrating that architectural modifications to existing models can yield significant improvements without starting from scratch.
However, layer repetition has limits. Each duplicated block adds computational cost during inference, and there’s likely a point of diminishing returns where additional repetitions provide minimal benefit. The researchers are developing new model formats optimized for duplicated layers, suggesting current implementations may not fully exploit this technique’s potential.
Alternative expansion methods exist, including mixture-of-experts architectures and sparse activation patterns. These approaches offer different tradeoffs between model size, inference speed, and capability. The universal representation finding may apply across these architectures, but validation would require similar cross-language analysis on different model types.
The broader implication remains speculative: if models develop language-independent semantic representations, what does this reveal about the nature of meaning itself? The similarity between model internals and human conceptual processing deserves further investigation.
Related Tips
Liquid AI MoE Models Run in Browser via WebGPU
Liquid AI's Mixture of Experts language models now run directly in web browsers using WebGPU technology, enabling client-side AI inference without servers or
LLMs Develop Universal Internal Language Representation
Research shows large language models develop a universal internal representation across languages in their middle layers, with identical content in different
Uncensored Qwen3.5-35B Maintains Full Performance
HauhauCS releases an uncensored version of Alibaba's Qwen3.5-35B language model that removes content filtering while preserving original capabilities,