Transcription of music and classification of audio and speech data are two important tasks in audio signal processing. Transcription of polyphonic music involves identifying the fundamental frequencies (pitches) of several notes played at a time. Its difficulty stems from the fact that harmonics of different tones tend to overlap, especially in western music. In this thesis, we introduce transcription and classification methods which are based on representation of the data in a meaningful manner. For transcription of polyphonic music we present an algorithm based on sparse representations in a structured dictionary, suitable for the spectra of music signals. For classification of audio data we propose to integrate into traditional classification methods a non-linear manifold learning technique, namely "diffusion maps". Finally, we examine empirically the performances of the proposed solutions.