Show simple item record

dc.contributor.advisorSanchez-Aguilar, Antonio
dc.contributor.authorHu, Yizhou
dc.date2014-12-01
dc.date.accessioned2016-02-19T15:38:18Z
dc.date.available2016-02-19T15:38:18Z
dc.date.issued2014
dc.identifier.urihttps://repository.tcu.edu/handle/116099117/10350
dc.description.abstractClassical Chinese was the medium of writing in East Asia and has since become extinct, leaving a large number of texts inaccessible to the general public. Expert-produced sentence segmentations are crucial to understanding classical Chinese texts. This study proposes utilizing various statistical models widely used in NLP models to automate such segmentation as a sequence labeling problem. Results produced by automated models such as HMM, CRF, Bidirectional LSTM and similar human reproduction are all validated against expert segmentation. CRF models overperform human work in accuracy metrics and, thus, are promising for potential real-life implementations. Fast and accurate automated segmentation improves the accessibility of historical texts in both their home culture and the rest of the world. Note: The source code, complete results and sample segmented texts of this study can be found at github.com/xlhdh/classycn.
dc.subjectclassical Chinese
dc.subjectnatural language processing
dc.titleClassical Chinese Sentence Segmentation as Sequence Labeling
etd.degree.departmentComputer Science
local.collegeCollege of Science and Engineering
local.collegeJohn V. Roach Honors College
local.departmentComputer Science


Files in this item

Thumbnail
This item appears in the following Collection(s)

Show simple item record