This book proposes a hybrid feature and hybrid decision fusion based Audio-Visual speaker identification system for evaluating the performance in challenging environment. In this work, the proposed Audio-Visual speaker identification system has been investigated in typical office environmental conditions. To do this, two approaches have been analyzed that utilize speech utterance with visual features to improve speaker identification performance in acoustically and visually challenging environment. One of the approaches seeks to eliminate the noise from the acoustic and visual features by using speech and facial image pre-processing techniques. The other approach combines speech and facial features that have been used by the multiple Discrete Hidden Markov Model classifiers with different variations of audio and visual features. Though the traditional HMM based Audio-Visual speaker identification system is very sensitive to the speech parameter variation, the proposed hybrid feature and decision fusion based Audio-Visual speaker identification is found to be stance and performs well for improving the robustness and naturalness of human-computer interaction.