EXTRACTING PARALLEL PHRASES FROM ENGLISH-PUNJABI CORPORA

AN INTEGRATED APPROACH

Fotogalerie

Zur Bildergalerie

Manpreet Singh Lehal

EXTRACTING PARALLEL PHRASES FROM ENGLISH-PUNJABI CORPORA

AN INTEGRATED APPROACH

Broschiertes Buch

Jetzt bewerten Jetzt bewerten

Autorenporträt

Andere Kunden interessierten sich auch für

Ajay Dubey
Bilingual Dictionaries Using Comparable and Quasi Comparable Corpora

24,99 €
Md. Farukuzzaman Khan
Generation of Text and Speech Corpora

47,99 €
Bao Thy Vuong
English for Techno-Food Processing

36,99 €
Maha Alawdat
Israeli English Teachers' Perception of Using ePortfolios

19,99 €
Gragn Kedir
Qebena-English Bilingual Dictionary: Using Machine Translation

36,99 €
Segun Adebayo
Design and Construction of Jatropha Seed Oil Extracting Machine

32,99 €
Sami Hasan
Rapidly-Fabricated Architectures of Parallel Multidimension Algorithms

49,99 €

Produktbeschreibung

This study presents a novel approach to extract parallel data from a comparable English-Punjabi corpus, addressing the scarcity of parallel corpora for this language pair. Unlike previous research, this approach focuses on creating high-precision parallel data using minimal resources. The data is sourced from diverse domains, including Wikipedia articles, TDIL's noisy parallel sentences, and Gyan Nidhi reports. The methodology consists of three phases: extracting and aligning documents, translating Punjabi texts into English using OpenNMT-py, and calculating content similarity through three measures-Euclidean Distance, Cosine, and Jaccard. These algorithms are run individually, and then their results are integrated to improve accuracy. By combining the scores of all three measures, the system achieves a precision of 93% and an accuracy of 86%. This integrated approach significantly enhances parallel data extraction for English-Punjabi corpora and holds potential for improving Statistical Machine Translation (SMT) models.

Produktdetails

Produktdetails
Verlag: LAP Lambert Academic Publishing
Seitenzahl: 204
Erscheinungstermin: 25. Oktober 2024
Englisch
Abmessung: 220mm x 150mm x 13mm
Gewicht: 322g
ISBN-13: 9786208225414
ISBN-10: 6208225418
Artikelnr.: 72066965

Herstellerkennzeichnung

Produktdetails

Verlag: LAP Lambert Academic Publishing
Seitenzahl: 204
Erscheinungstermin: 25. Oktober 2024
Englisch
Abmessung: 220mm x 150mm x 13mm
Gewicht: 322g
ISBN-13: 9786208225414
ISBN-10: 6208225418
Artikelnr.: 72066965

Herstellerkennzeichnung

Autorenporträt

Dr. Manpreet Singh Lehal is an expert in Natural Language Processing, specializing in English to Punjabi translation. With over 18 years of experience, his work has led to multiple national and international patents. He has been honored by the state government for his contributions.