This book develops techniques and methodologies for
the examination of the complex systems that are
lexicalized statistical parsing models. The primary
idea is treating the model as data , which is not a
particular method, but a paradigm and a research
methodology. I argue that lexicalized statistical
parsing models have become increasingly complex, and
therefore require thorough scrutiny, both to achieve
the scientific aim of understanding what has been
built thus far, and to achieve both the scientific
and engineering goal of using that understanding for
progress. In this book, I take a particular, dominant
type of parsing model and perform a macro analysis,
to reveal its core (and design a software engine that
modularizes the periphery), and also crucially
perform a detailed analysis, which provides for the
first time a window onto the efficacy of specific
parameters. These analyses have not only yielded
insight into the core model, but they have also
enabled the identification of inefficiencies in the
baseline model, such that those inefficiencies can be
reduced to form a more compact model, or exploited
for finding a better-estimated model with higher
accuracy, or both.
the examination of the complex systems that are
lexicalized statistical parsing models. The primary
idea is treating the model as data , which is not a
particular method, but a paradigm and a research
methodology. I argue that lexicalized statistical
parsing models have become increasingly complex, and
therefore require thorough scrutiny, both to achieve
the scientific aim of understanding what has been
built thus far, and to achieve both the scientific
and engineering goal of using that understanding for
progress. In this book, I take a particular, dominant
type of parsing model and perform a macro analysis,
to reveal its core (and design a software engine that
modularizes the periphery), and also crucially
perform a detailed analysis, which provides for the
first time a window onto the efficacy of specific
parameters. These analyses have not only yielded
insight into the core model, but they have also
enabled the identification of inefficiencies in the
baseline model, such that those inefficiencies can be
reduced to form a more compact model, or exploited
for finding a better-estimated model with higher
accuracy, or both.