32,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
  • Broschiertes Buch

Duplicate records do not have a common key but refer to a unit entity. Databases that include these records have often some errors which cause the matching problem in duplicate records becomes a complex problem. These errors are: typing errors, incomplete information such as abbreviations, ignoring of standard formats or a combination of the above factors. In this book, databases are used in which typing errors are more than other errors. This database contains real estate information that includes 4 fields: name, surname, property address and property area. The goals of this book are: a…mehr

Produktbeschreibung
Duplicate records do not have a common key but refer to a unit entity. Databases that include these records have often some errors which cause the matching problem in duplicate records becomes a complex problem. These errors are: typing errors, incomplete information such as abbreviations, ignoring of standard formats or a combination of the above factors. In this book, databases are used in which typing errors are more than other errors. This database contains real estate information that includes 4 fields: name, surname, property address and property area. The goals of this book are: a review on existing algorithms in identifying duplicate data in the fields which are: Edit-distance, Smith-waterman, Jaro, Jaro-Winkler, Lcs and N-gram; description of the proposed algorithms was presented to improve the efficiency and increase the precision of identifying duplication which are the proposed token-based algorithm and the proposed algorithm based on typing error; and comparing thesealgorithms efficiency in a large Persian database.
Autorenporträt
She received her master's degree in computer engineering. She worked on database field. She is currently lecturer in Computer Engineering Department, Sarvestan branch, Islamic Azad University, Sarvestan, Iran.