Abstract | Classical Chinese was the medium of writing in East Asia and has since become extinct, leaving a large number of texts inaccessible to the general public. Expert-produced sentence segmentations are crucial to understanding classical Chinese texts. This study proposes utilizing various statistical models widely used in NLP models to automate such segmentation as a sequence labeling problem. Results produced by automated models such as HMM, CRF, Bidirectional LSTM and similar human reproduction are all validated against expert segmentation. CRF models overperform human work in accuracy metrics and, thus, are promising for potential real-life implementations. Fast and accurate automated segmentation improves the accessibility of historical texts in both their home culture and the rest of the world. Note: The source code, complete results and sample segmented texts of this study can be found at github.com/xlhdh/classycn. |