writing

LongPage Dataset: 6K Books with Hierarchical Plans

The LongPage Dataset contains 6,000 books paired with hierarchical writing plans that break down each book's structure into multiple levels of organization for

Someone found a pretty interesting dataset for training AI to write full-length novels.

LongPage just got updated with 6K+ complete books, each paired with hierarchical planning traces that break down stories from outline to chapters to scenes. The training data shows how novels are structured from high-level concept down to individual scenes.

Dataset link:

https://huggingface.co/datasets/Pageshift-Entertainment/LongPage

The team is currently training a full-book writing model on this data and plans to release it once the output quality is decent. Early checkpoints are already running internally.

For anyone experimenting with long-form creative AI, this beats the usual paragraph-level training data. The hierarchical traces give models actual structure to follow instead of just predicting the next token blindly.