Automatic Disambiguation of Author Names in Bibliographic Repositories

Versandkostenfrei!

Versandfertig in 6-10 Tagen

58,84 €

inkl. MwSt.

Jetzt bewerten

Weitere Ausgaben:

eBook, PDF

PAYBACK Punkte

0 °P sammeln!

Weiterlesen / Aufklappen

This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research onthis topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.

Weniger Details

Anderson A.Ferreira holds a B.S. degree in Computer Science from the Universidade Federalde Viçosa, Brazil, and an M.Sc. and a Ph.D. degree in Computer Science from the Universidade Federal de Minas Gerais, Brazil, under the supervision of Dr. Marcos André Gonçalves and Prof. Alberto H. F. Laender. In 2011, he joined the Computing Department of the Universidade Federal de Ouro Preto, where he is currently an Associate Professor. He has published several articles in major conferences and journals from the digital libraries and databases areas,such as JCDL, SBBD/JIDM, JASIST, IP&M, DocEng, LA-Web, TKDD, Information Sciences, World Digital,International Journal on Digital Libraries, andSIGMOD Record. Dr. Ferreira has also served as an ad hoc referee for several journals as JASIST, Scientometrics, Information Science, The Knowledge Engineering Review, Informetrics, Online Information Review, IP&M,Internet Services and Applications, Machine Learning Research, KNOSYS,an dNatural Language Engineering. Marcos André Gonçalves holds a B.S. degree in Computer Science (1995) from the Universidade Federal do Ceara, Brazil, an M.Sc. degree in Computer Science (1997) from the Universidade Estadual de Campinas (UNICAMP), Brazil, and a Ph.D. degree in Computer Science (2004) from Virginia Tech, USA. He joined the Computer Science Department of the Universidade Federal de Minas Gerais in 2005, where he is currently an Associate Professor, coheading the Data Management Research Group along with Prof. Alberto Laender. Dr. Gonçalves has served as a program committee member for several international and national conferences on in-formation retrieval, digital libraries, and Web-related topics, among them: ACM SIGIR, ACMCIKM, ACM/IEEE JCDL, ACM RecSys, TPDL, SBBD, and as reviewer for journals such as ACM Transactions on Information Systems, Information Processing & Management, Journal of the Association for Information Science, and Technology (JASIST), Information Sciences. He was an Affiliated Member of the Brazilian Academy of Sciences (2008-2012). He has numerous awards including the prestigious Premio Capes for Best Brazilian Doctoral Dissertation in Computer Science (co-advisor of Fabiano Belem), awards as advisor for Best Ph.D. Dissertation and Master Thesis from the Brazilian Computer Society (2019, 2018, 2013, 2012), Best Student Paper Award (ACM/IEEE JCDL 2004 and 2014), Premio Mauro Castilho-SBBD Best Paper Award(2011, 2010, 2008), and some Google Research Awards (2014-2017). He is the author of more than 300 refereed journal and conference papers. His current research interests include Information Retrieval, Machine Learning, and Social Networks. Alberto H.F.Laender holds a B.Sc. degree in Electrical Engineering (1974) and an M.Sc. degree in Computer Science (1979), both from the Universidade Federal de Minas Gerais, Brazil,and a Ph.D. degree in Computing (1984) from the University of East Anglia, UK. He joined the Computer Science Department ofthe Universidade Federal de Minas Gerais in 1975, where he is a Full Professor and heads the Data Management Research Group. In 1997, he was a Visiting Scholar at HP Labs in Palo Alto, California. He has served on the advisory committee of several Brazilian research funding agencies and was also a member of ACM SIGMOD's Advisory Board (2006-2010) and SIGMOD's Jim Gray Ph.D. Dissertation Award Committee (2008-2011). Prof. Laender has also served as a program committee member for several national and international conferences on databases, digital libraries, and Web-related topics, among them SBBD, KdMiLe, VLDB, CIKM, SIGIR, JCDL, TPDL, WWW, SPIRE and ICDE. He is a founder-member of the Brazilian Computer Society and one of the co-founders of Akwan Information Technologies, a Brazilian search technology company that was acquired by Google Inc. in 2005 to become its Research and Development Center for Latin America. Prof. Laender is a member of the Brazilian Academy of Sciences and of the Brazilian National Academy of Engineering, and in 2010 he was awarded the National Order of the Scientific Merit by the Brazilian President. He is the author of more than 200 refereed journal and conference papers.His current research interests include Data Management, Digital Libraries, Social Networks,and Bibliometrics.