Hoshi: A Japanese Morphological Adorner for TEI XML
Morphological adornment of text in Text Encoding Initiative (TEI) XML can be useful for studies in textual analysis. MorphAdorner is a principal tool for providing such functionality in English texts. However, its practical use is limited when the input XML contains branching text, e.g. when <choice> appears, as it modifies the input document. In such cases, preprocessing is required to obtain the desired results. This article introduces a new tool Hoshi with the purpose of determining how this issue can be best handled with minimal input modification and preprocessing needed. It also investigates whether parsing software available online can be used to supply morphological information that can be encoded in an output format like MorphAdorner, and whether such a tool can be developed to adorn text in other languages. Challenges include those posed by the target language, the current software available for providing morphological analysis in it, and the schema needed for encoding the results. Moreover, technical hurdles presented by segmented and branching text can complicate the alignment process, especially when the intent is to guarantee input document integrity. Our approach for handling these is presented, and the article ends by outlining future applications of Hoshi that can help to enhance TEI scholarship that prioritizes the use of morphological word metadata.
Jerry Bonnell, Mitsunori Ogihara, Hoshi: A Japanese morphological adorner for TEI XML, Digital Scholarship in the Humanities, , fqaa003, https://doi.org/10.1093/llc/fqaa003