Outliers, or unusually extreme values, have
traditionally been viewed as a nuisance to
researchers. Classical statistical analysis can lead
to completely opposite conclusions if outliers
are present or absent. Such points can, however,
alert the researcher to unexpected features hidden
within in a data set, and lead down paths of
surprising discovery. Outliers could even be the
primary purpose of the investigation. Credit card
fraud, electronic network intrusions, and
unusual stock characteristics preceding a large move,
for instance, can all be seen as outliers whose
presence is important to establish as quickly as
possible. Several methods have been
proposed to identify outliers, but many of these are
not computationally suitable for large data sets.
This book presents a review of multivariate outlier
identification with particular emphasis on large data
sets, and investigates a new method. The intended
audience is statistics practitioners and data
analysts who wish to detect outliers, as well as
those interested in the historical development of the
field. Basic familiarity with statistical concepts is
assumed.
traditionally been viewed as a nuisance to
researchers. Classical statistical analysis can lead
to completely opposite conclusions if outliers
are present or absent. Such points can, however,
alert the researcher to unexpected features hidden
within in a data set, and lead down paths of
surprising discovery. Outliers could even be the
primary purpose of the investigation. Credit card
fraud, electronic network intrusions, and
unusual stock characteristics preceding a large move,
for instance, can all be seen as outliers whose
presence is important to establish as quickly as
possible. Several methods have been
proposed to identify outliers, but many of these are
not computationally suitable for large data sets.
This book presents a review of multivariate outlier
identification with particular emphasis on large data
sets, and investigates a new method. The intended
audience is statistics practitioners and data
analysts who wish to detect outliers, as well as
those interested in the historical development of the
field. Basic familiarity with statistical concepts is
assumed.