Loading...
Thumbnail Image
Publication

Classical Chinese Sentence Segmentation as Sequence Labeling

Hu, Yizhou
Citations
Altmetric:
Soloist
Composer
Publisher
Date
2014
Additional date(s)
2014-12-01
Abstract
Classical Chinese was the medium of writing in East Asia and has since become extinct, leaving a large number of texts inaccessible to the general public. Expert-produced sentence segmentations are crucial to understanding classical Chinese texts. This study proposes utilizing various statistical models widely used in NLP models to automate such segmentation as a sequence labeling problem. Results produced by automated models such as HMM, CRF, Bidirectional LSTM and similar human reproduction are all validated against expert segmentation. CRF models overperform human work in accuracy metrics and, thus, are promising for potential real-life implementations. Fast and accurate automated segmentation improves the accessibility of historical texts in both their home culture and the rest of the world. Note: The source code, complete results and sample segmented texts of this study can be found at github.com/xlhdh/classycn.
Contents
Subject
classical Chinese
natural language processing
Subject(s)
Research Projects
Organizational Units
Journal Issue
Genre
Description
Format
Department
Computer Science