In this book, the developed methods of natural language processing for Turkish by using rule-based approach were told, and also an implemented infrastructure, Rule-Based Automatical Corpus Generation (RB-CorGen), to use the new developed methods was explained briefly. For testing RB-CorGen on Turkish, the roots, stems and suffixes were obtained by coopoeration with Turkish Linguistic Association (Türk Dil Kurumu, TDK) and Dokuz Eylul University, College of Literature Linguistic Department, the defined tags and grammatical rules were stored in XML formatted file, and documents, include nearly 95 million wordforms, were collected from five Turkish newspapers in electronic environment. New methods, called Rule-Based Sentence Boundary Detection (RB-SBD), Rule-Based Morphological Analyser (RB-MA) and Rule-Based POS Tagging (RB-POST), were developed and analysed. It was seen that the success rates of these methods increase with the increasing number of rules.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.