Term weighting is a useful technique that extracts important features from textual documents, thereby providing a basis for different Text Mining approaches. The objective of this work is to study the existing term weighting algorithms for feature extraction and to develop an efficient term weighting algorithm for mining salient features from internet based newswire sources. TF*PDF (Term Frequency * Proportional Document Frequency) is the most popular term weighting algorithm which extracts influential features from news archives. TF*PDF satisfies the basic property of the features in news documents i.e., frequency and thus increases the accuracy when compared to other term weighing algorithms such as Binary, TF (Term Frequency), TF-IDF (Term Frequency-Inverse Document Frequency) and its variants. However, only frequency property is not sufficient for salient topic extraction. To overcome that problem, this book presents an innovative and effective term weighting algorithm that considers Position, Scattering and Topicality along with Frequency for extracting short lived and long running events.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.