LongPage Dataset: 6K Books with Hierarchical Plans
The LongPage Dataset contains 6,000 books paired with hierarchical writing plans that break down each book's structure into multiple levels of organization for
Someone found a pretty interesting dataset for training AI to write full-length novels.
LongPage just got updated with 6K+ complete books, each paired with hierarchical planning traces that break down stories from outline to chapters to scenes. The training data shows how novels are structured from high-level concept down to individual scenes.
Dataset link:
https://huggingface.co/datasets/Pageshift-Entertainment/LongPage
The team is currently training a full-book writing model on this data and plans to release it once the output quality is decent. Early checkpoints are already running internally.
For anyone experimenting with long-form creative AI, this beats the usual paragraph-level training data. The hierarchical traces give models actual structure to follow instead of just predicting the next token blindly.
Related Tips
DeepSeek Quietly Tests Updated Model with Recent Knowledge
DeepSeek conducts quiet testing of an updated AI model that incorporates more recent knowledge and information, potentially improving its capabilities beyond
GPT-OSS 120B Uncensored: Zero Refusals Reported
GPT-OSS 120B Uncensored is an open-source language model reportedly designed without content restrictions, claiming to fulfill all user requests without
Nvidia's DMS Cuts LLM Memory Usage by 8x
Nvidia introduces Dynamic Memory Scheduling that reduces large language model memory consumption by eight times, enabling more efficient AI inference and