32,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in über 4 Wochen
  • Broschiertes Buch

In database record linkage or natural language processing tasks one usually encounters problems when working with data or texts containing noise, typos and other kinds of errors. In this thesis the use of modified Levenshtein edit distances to deal with these problems is investigated. For the task of linking distinct records representing the same entity in a database we used and extended the WEKA API for Machine Learning, obtaining good precision and recall results. For the task of searching and annotating occurrences of specified words in texts written in natural language we implemented an…mehr

Produktbeschreibung
In database record linkage or natural language processing tasks one usually encounters problems when working with data or texts containing noise, typos and other kinds of errors. In this thesis the use of modified Levenshtein edit distances to deal with these problems is investigated. For the task of linking distinct records representing the same entity in a database we used and extended the WEKA API for Machine Learning, obtaining good precision and recall results. For the task of searching and annotating occurrences of specified words in texts written in natural language we implemented an approximate Gazetteer for GATE, the General Architecture for Text Engineering.
Autorenporträt
The author obtained his double M.Sc. degree in Computational Logic at the Dresden University of Technology, Germany, and at the Vienna University of Technology, Austria. He currently works as a research assistant in the Theory and Logic Group of the Institute for Computer Languages at the Vienna University of Technology.