The large amount of data generated online has enabled data scientists to analyze this information and derive conclusions across various fields. However, real-time data is often prone to imbalance, which can degrade data quality and poses a significant challenge in the machine learning domain. Sampling-based techniques and algorithm-based models are two primary methods used to address and balance data imbalance. This thesis presents three distinct techniques to manage different levels of imbalance in real-time data.The initial approach proposes a sampling-based technique integrated with the bagging mechanism to handle data imbalance. The model identifies class-based data imbalance and performs oversampling for each available class. The bagging mechanism involves creating subsets of the training data, aiming to vary the imbalance levels in the training data to ensure effective prediction. Despite this, the effect of imbalance persists in the prediction mechanism, leading to the incorrect classification of several minority classes.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.