Embeddings have undoubtedly been one of the most influential research areas in Natural Language Processing (NLP). Encoding information into a low-dimensional vector representation, which is easily integrable in modern machine learning models, has played a central role in the development of NLP. Embedding techniques initially focused on words, but the attention soon started to shift to other forms: from graph structures, such as knowledge bases, to other types of textual content, such as sentences and documents. This book provides a high-level synthesis of the main embedding techniques in NLP,…mehr
Embeddings have undoubtedly been one of the most influential research areas in Natural Language Processing (NLP). Encoding information into a low-dimensional vector representation, which is easily integrable in modern machine learning models, has played a central role in the development of NLP. Embedding techniques initially focused on words, but the attention soon started to shift to other forms: from graph structures, such as knowledge bases, to other types of textual content, such as sentences and documents. This book provides a high-level synthesis of the main embedding techniques in NLP, in the broad sense. The book starts by explaining conventional word vector space models and word embeddings (e.g., Word2Vec and GloVe) and then moves to other types of embeddings, such as word sense, sentence and document, and graph embeddings. The book also provides an overview of recent developments in contextualized representations (e.g., ELMo and BERT) and explains their potential in NLP. Throughout the book, the reader can find both essential information for understanding a certain topic from scratch and a broad overview of the most successful techniques developed in the literature.
Mohammad Taher Pilehvar is an Assistant Professor at the Tehran Institute for Advanced Studies (TeIAS) and an Affiliated Lecturer at the University of Cambridge. Taher's research is primarily in Lexical Semantics with a special focus on representation learning for word senses. Taher has co-instructed multiple tutorials at *ACL conferences and co-organized four SemEval tasks and an EACL workshop on semantic representation. Taher has contributed to the field of lexical semantics with several publications in the recent years, including two best paper nominees at ACL (2013 and 2017) and a survey on vector representations of meaning.Jose Camacho-Collados is a UKRI Future Leaders Fellow and a Lecturer at the School of Computer Science and Informatics at Cardiff University (United Kingdom). Previously, he was a Google Doctoral Fellow, completed his Ph.D. at Sapienza University of Rome (Italy), and had pre-doctoral experience as a statistical research engineer in France. His background education includes an Erasmus Mundus Masters in Human Language Technology and a 5-year B.Sc. degree in Mathematics (Spain). Jose's main area of expertise is Natural Language Processing (NLP), particularly computational semantics or, in other words, how to make computers understand language. In this topic, together with Taher Pilehvar, he has written a well-received survey on vector representations of meaning, which was published in the Journal of Artificial Intelligence Research and established the basis of this book. His research has pivoted around both scientific contributions through regular publications in top AI and NLP venues such as ACL, EMNLP, AAAI, and IJCAI, and applications with direct impact in society, with a special focus on social media and multilinguality. He has also organized several international workshops, tutorials, and open challenges with hundreds of participants across the world.
Inhaltsangabe
Preface.- Introduction.- Background.- Word Embeddings.- Graph Embeddings.- Sense Embeddings.- Contextualized Embeddings.- Sentence and Document Embeddings.- Ethics and Bias.- Conclusions.- Bibliography.- Authors' Biographies.