Statistical machine translation (SMT) offers many
benefits over rule based and example based
machine translation especially ease to train and
robustness. It represents state-of-the-art in machine
translation (MT) but it has to deal with certain
issues that are not trivial to solve in a statistical
framework: correct distortion, correct agreement,
morphology issues, etc. In order to solve distortion
issues, different models were proposed: syntax-based
MT, enhanced distortion models, clause restructuring.
All of these define more complex distortion models
than monotonic distortion models (which favor lack of
word reordering). These proposed models don't
completely solve the issue of easily adding
linguistic knowledge into the MT decoder. The work
proposes a model designed to augment SMT models with
linguistic knowledge, either in a rule-based fashion
or in a probabilistic fashion. The theoretical
framework proposed in this work was implemented in
Phramer statistical phrase-based decoder and tested
using various levels of knowledge - surface , part
of speech, constituency parse trees. The experimental
results show improvement in the quality of the
translation.
benefits over rule based and example based
machine translation especially ease to train and
robustness. It represents state-of-the-art in machine
translation (MT) but it has to deal with certain
issues that are not trivial to solve in a statistical
framework: correct distortion, correct agreement,
morphology issues, etc. In order to solve distortion
issues, different models were proposed: syntax-based
MT, enhanced distortion models, clause restructuring.
All of these define more complex distortion models
than monotonic distortion models (which favor lack of
word reordering). These proposed models don't
completely solve the issue of easily adding
linguistic knowledge into the MT decoder. The work
proposes a model designed to augment SMT models with
linguistic knowledge, either in a rule-based fashion
or in a probabilistic fashion. The theoretical
framework proposed in this work was implemented in
Phramer statistical phrase-based decoder and tested
using various levels of knowledge - surface , part
of speech, constituency parse trees. The experimental
results show improvement in the quality of the
translation.