Previous studies have shown that infants use prosodic information as an aid for discriminating and recognising words. The present work is based on a word learning model which automatically extracts target words from raw speech input paired with a label for the target word. This model was enhanced by incorporating prosodic information. In addition, an unsupervised model is developed which does not rely on a label of any kind. Although prosodic information could not improve the performance of the unsupervised model, it is shown that the incorporation of pitch leads to an increase of performance in the supervised case and that the unsupervised model yields effective results.