The textual information extracted from a digital video could be exploited in semantic indexing and retrieval of digital video libraries. A number of such retrieval systems have been researched previously for the detection of text in Latin alphabets. The detection of Urdu caption however, has not been explored as yet.Current book describes a system that will be able to automatically detect and localize Urdu caption text appearing in video sequence such as in news channels. The system uses edge features for text localization. The candidate text regions are then fed into Artificial Neural Network (ANN) for validation. Finally, the text is extracted from validated candidate text regions. The output of this process could be fed to a (Urdu) Optical Character Recognition (OCR) system to recognize textual content in the video images and employ the extracted (key) words to index the video. Users will then be able to query the indexed video library with a given keyword and find all the videos (and occurrences in a video) containing the keyword provided.