51,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
  • Broschiertes Buch

In this thesis, we investigate information retrieval techniques for Indonesian. Stemming is the process of reducing morphological variants of a word to a common stem form. Although several stemming algorithms have been proposed for Indonesian, there is no consensus on which gives better performance. We empirically explore these stemming algorithms, propose novel extensions to the best algorithm, develop a new Indonesian stemmer, and show that these can improve stemming correctness. We propose a range of techniques to enhance the performance of Indonesian information retrieval. Our experiments…mehr

Produktbeschreibung
In this thesis, we investigate information retrieval
techniques for Indonesian.
Stemming is the process of reducing morphological
variants of a word to a
common stem form.
Although several stemming algorithms have been
proposed for Indonesian,
there is no consensus on which gives better performance.
We empirically explore these stemming algorithms,
propose novel extensions to the best algorithm,
develop a new Indonesian stemmer, and show that
these can improve stemming correctness.
We propose a range of techniques to enhance the
performance of Indonesian information retrieval.
Our experiments show that many of these techniques
can increase retrieval performance.
We also address the problem of automatic creation of
parallel corpora which are essential for
cross-lingual information retrieval and other
natural language processing tasks, including machine
translation.
We describe algorithms that we have developed to
automatically identify parallel documents for
Indonesian and English.
We also investigate the applicability of our
identification algorithms
for other languages that use the Latin alphabet
including German and French.
Autorenporträt
Jelita finished her PhD from the RMIT University, Australia in
2007. Her research interest includes information retrieval (mono
and cross-lingual), natural language processing, machine
translation, and corpus construction. She currently works as a
C/C++ software engineer.