- Broschiertes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Building on the success of the first edition Digital Speech offers extensive new, updated and revised material based upon the latest research. This Second Edition continues to provide the fundamental technical background required for low bit rate speech coding and the hottest developments in digital speech coding techniques that are applicable to evolving communication systems. _ Features new chapters on Pitch Estimation and Voice-Unvoiced Classification of Speech, Harmonic Speech Coding and Multimode Speech Coding _ Presents a comprehensively revised chapter entitled Analysis by Synthesis LPC…mehr
Andere Kunden interessierten sich auch für
- A. M. KondozDigital Speech233,99 €
- Ramon L. Cozar DelgadoSpoken, Multilingual and Multimodal Dialogue Systems160,99 €
- Stephen LevinsonMathematical Models for Speech Technology169,99 €
- Iain E. RichardsonVideo Codec Design181,99 €
- Lonnie C. LudemanRandom Processes247,99 €
- Igor S. Pandzic / Robert Forchheimer (Hgg.)Mpeg-4 Facial Animation206,99 €
- Keshab K. ParhiVLSI Digital Signal Processing Systems219,99 €
-
-
-
Building on the success of the first edition Digital Speech offers extensive new, updated and revised material based upon the latest research. This Second Edition continues to provide the fundamental technical background required for low bit rate speech coding and the hottest developments in digital speech coding techniques that are applicable to evolving communication systems.
_ Features new chapters on Pitch Estimation and Voice-Unvoiced Classification of Speech, Harmonic Speech Coding and Multimode Speech Coding
_ Presents a comprehensively revised chapter entitled Analysis by Synthesis LPC Coding including specific examples of popular speech coders such as CELP (Code-Excited Linear Predictive) Coding
_ Contains an updated chapter on Efficient LPC Quantization Methods including MSVQ and anti-aliasing filtering
_ Discusses Voice Activity Detection (VAD) methods
_ Offers expanded coverage of speech enhancement techniques such as echo cancellation and noise suppression
Written by a well-known, highly respected academic, this authoritative volume will be invaluable to practising engineers, network designers, computer scientists and advanced students in communications, electrical and electronic engineering.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
_ Features new chapters on Pitch Estimation and Voice-Unvoiced Classification of Speech, Harmonic Speech Coding and Multimode Speech Coding
_ Presents a comprehensively revised chapter entitled Analysis by Synthesis LPC Coding including specific examples of popular speech coders such as CELP (Code-Excited Linear Predictive) Coding
_ Contains an updated chapter on Efficient LPC Quantization Methods including MSVQ and anti-aliasing filtering
_ Discusses Voice Activity Detection (VAD) methods
_ Offers expanded coverage of speech enhancement techniques such as echo cancellation and noise suppression
Written by a well-known, highly respected academic, this authoritative volume will be invaluable to practising engineers, network designers, computer scientists and advanced students in communications, electrical and electronic engineering.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Wiley Series in communication and distributed systems
- Verlag: Wiley & Sons
- 2. Aufl.
- Seitenzahl: 464
- Erscheinungstermin: 29. Oktober 2004
- Englisch
- Abmessung: 234mm x 156mm x 25mm
- Gewicht: 875g
- ISBN-13: 9780470870082
- ISBN-10: 0470870087
- Artikelnr.: 08453355
- Wiley Series in communication and distributed systems
- Verlag: Wiley & Sons
- 2. Aufl.
- Seitenzahl: 464
- Erscheinungstermin: 29. Oktober 2004
- Englisch
- Abmessung: 234mm x 156mm x 25mm
- Gewicht: 875g
- ISBN-13: 9780470870082
- ISBN-10: 0470870087
- Artikelnr.: 08453355
Professor Kondoz joined the university of Surrey as a PhD. student in October 1984. From 1986 to 1988 he was employed as a research fellow in the communications group. After completing his PhD, in 1988 he was appointed as a lecturer. In 1995 he became a Reader and in 1997 Professor and Deputy Director in the Centre for Communication Systems Research (CCSR). He has been involved in teaching of digital signal processing, telecommunications theory and source coding in both undergradute and postgraduate levels. In research he has been heading Multimedia Communication Research Group since 1990. To date, Professor Kondoz has supervised 20 successful PhD students in Speech, Video and Channel coding, Source data packetisation, Error resilient speech and video transmission and Mobile multimedia communications. His current research interests are, Low bit rate speech, image and video coding error resilient video transmission, mobile multimedia communications, robust wireless ATM, real-time terminal design and implementation for mobile communications. Outside the University, Professor Kondoz has been a member of both the IEE and IEEE. He is a CEng and served on E5. He is on EPSRC College for signal processing and communications.
Preface xiii
Acknowledgements xv
1 Introduction 1
2 Coding Strategies and Standards 5
2.1 Introduction 5
2.2 Speech Coding Techniques 6
2.2.1 Parametric Coders 7
2.2.2 Waveform-approximating Coders 8
2.2.3 Hybrid Coding of Speech 8
2.3 Algorithm Objectives and Requirements 9
2.3.1 Quality and Capacity 9
2.3.2 Coding Delay 10
2.3.3 Channel and Background Noise Robustness 10
2.3.4 Complexity and Cost 11
2.3.5 Tandem Connection and Transcoding 11
2.3.6 Voiceband Data Handling 11
2.4 Standard Speech Coders 12
2.4.1 ITU-T Speech Coding Standard 12
2.4.2 European Digital Cellular Telephony Standards 13
2.4.3 North American Digital Cellular Telephony Standards 14
2.4.4 Secure Communication Telephony 14
2.4.5 Satellite Telephony 15
2.4.6 Selection of a Speech Coder 15
2.5 Summary 18
Bibliography 18
3 Sampling and Quantization 23
3.1 Introduction 23
3.2 Sampling 23
3.3 Scalar Quantization 26
3.3.1 Quantization Error 27
3.3.2 Uniform Quantizer 28
3.3.3 Optimum Quantizer 29
3.3.4 Logarithmic Quantizer 32
3.3.5 Adaptive Quantizer 33
3.3.6 Differential Quantizer 36
3.4 Vector Quantization 39
3.4.1 Distortion Measures 42
3.4.2 Codebook Design 43
3.4.3 Codebook Types 44
3.4.4 Training, Testing and Codebook Robustness 52
3.5 Summary 54
Bibliography 54
4 Speech Signal Analysis and Modelling 57
4.1 Introduction 57
4.2 Short-Time Spectral Analysis 57
4.2.1 Role of Windows 58
4.3 Linear Predictive Modelling of Speech Signals 65
4.3.1 Source Filter Model of Speech Production 65
4.3.2 Solutions to LPC Analysis 67
4.3.3 Practical Implementation of the LPC Analysis 74
4.4 Pitch Prediction 77
4.4.1 Periodicity in Speech Signals 77
4.4.2 Pitch Predictor (Filter) Formulation 78
4.5 Summary 84
Bibliography 84
5 Efficient LPC Quantization Methods 87
5.1 Introduction 87
5.2 Alternative Representation of LPC 87
5.3 LPC to LSF Transformation 90
5.3.1 Complex Root Method 95
5.3.2 Real Root Method 95
5.3.3 Ratio Filter Method 98
5.3.4 Chebyshev Series Method 100
5.3.5 Adaptive Sequential LMS Method 100
5.4 LSF to LPC Transformation 101
5.4.1 Direct Expansion Method 101
5.4.2 LPC Synthesis Filter Method 102
5.5 Properties of LSFs 103
5.6 LSF Quantization 105
5.6.1 Distortion Measures 106
5.6.2 Spectral Distortion 106
5.6.3 Average Spectral Distortion and Outliers 107
5.6.4 MSE Weighting Techniques 107
5.7 Codebook Structures 110
5.7.1 Split Vector Quantization 111
5.7.2 Multi-Stage Vector Quantization 113
5.7.3 Search strategies for MSVQ 114
5.7.4 MSVQ Codebook Training 116
5.8 MSVQ Performance Analysis 117
5.8.1 Codebook Structures 117
5.8.2 Search Techniques 117
5.8.3 Perceptual Weighting Techniques 119
5.9 Inter-frame Correlation 121
5.9.1 LSF Prediction 122
5.9.2 Prediction Order 124
5.9.3 Prediction Factor Estimation 125
5.9.4 Performance Evaluation of MA Prediction 126
5.9.5 Joint Quantization of LSFs 128
5.9.6 Use of MA Prediction in Joint Quantization 129
5.10 Improved LSF Estimation Through Anti-Aliasing Filtering 130
5.10.1 LSF Extraction 131
5.10.2 Advantages of Low-pass Filtering in Moving Average Prediction 135
5.11 Summary 146
Bibliography 146
6 Pitch Estimation and Voiced-Unvoiced Classification of Speech 149
6.1 Introduction 149
6.2 Pitch Estimation Methods 150
6.2.1 Time-Domain PDAs 151
6.2.2 Frequency-Domain PDAs 155
6.2.3 Time- and Frequency-Domain PDAs 158
6.2.4 Pre- and Post-processing Techniques 166
6.3 Voiced-Unvoiced Classification 178
6.3.1 Hard-Decision Voicing 178
6.3.2 Soft-Decision Voicing 189
6.4 Summary 196
Bibliography 197
7 Analysis by Synthesis LPC Coding 199
7.1 Introduction 199
7.2 Generalized AbS Coding 200
7.2.1 Time-Varying Filters 202
7.2.2 Perceptually-based Minimization Procedure 203
7.2.3 Excitation Signal 206
7.2.4 Determination of Optimum Excitation Sequence 208
7.2.5 Characteristics of AbS-LPC Schemes 212
7.3 Code-Excited Linear Predictive Coding 219
7.3.1 LPC Prediction 221
7.3.2 Pitch Prediction 222
7.3.3 Multi-Pulse Excitation 230
7.3.4 Codebook Excitation 238
7.3.5 Joint LTP and Codebook Excitation Computation 252
7.3.6 CELP with Post-Filtering 255
7.4 Summary 258
Bibliography 258
8 Harmonic Speech Coding 261
8.1 Introduction 261
8.2 Sinusoidal Analysis and Synthesis 262
8.3 Parameter Estimation 263
8.3.1 Voicing Determination 264
8.3.2 Harmonic Amplitude Estimation 266
8.4 Common Harmonic Coders 268
8.4.1 Sinusoidal Transform Coding 268
8.4.2 Improved Multi-Band Excitation, INMARSAT-M Version 270
8.4.3 Split-Band Linear Predictive Coding 271
8.5 Summary 275
Bibliography 275
9 Multimode Speech Coding 277
9.1 Introduction 277
9.2 Design Challenges of a Hybrid Coder 280
9.2.1 Reliable Speech Classification 281
9.2.2 Phase Synchronization 281
9.3 Summary of Hybrid Coders 281
9.3.1 Prototype Waveform Interpolation Coder 282
9.3.2 Combined Harmonic and Waveform Coding at Low Bit-Rates 282
9.3.3 A 4 kb/s Hybrid MELP/CELP Coder 283
9.3.4 Limitations of Existing Hybrid Coders 284
9.4 Synchronized Waveform-Matched Phase Model 285
9.4.1 Extraction of the Pitch Pulse Location 286
9.4.2 Estimation of the Pitch Pulse Shape 292
9.4.3 Synthesis using Generalized Cubic Phase Interpolation 297
9.5 Hybrid Encoder 298
9.5.1 Synchronized Harmonic Excitation 299
9.5.2 Advantages and Disadvantages of SWPM 301
9.5.3 Offset Target Modification 304
9.5.4 Onset Harmonic Memory Initialization 308
9.5.5 White Noise Excitation 309
9.6 Speech Classification 311
9.6.1 Open-Loop Initial Classification 312
9.6.2 Closed-Loop Transition Detection 315
9.6.3 Plosive Detection 318
9.7 Hybrid Decoder 319
9.8 Performance Evaluation 320
9.9 Quantization Issues of Hybrid Coder Parameters 322
9.9.1 Introduction 322
9.9.2 Unvoiced Excitation Quantization 323
9.9.3 Harmonic Excitation Quantization 323
9.9.4 Quantization of ACELP Excitation at Transitions 331
9.10 Variable Bit Rate Coding 331
9.10.1 Transition Quantization with 4 kb/s ACELP 332
9.10.2 Transition Quantization with 6 kb/s ACELP 332
9.10.3 Transition Quantization with 8 kb/s ACELP 333
9.10.4 Comparison 334
9.11 Acoustic Noise and Channel Error Performance 336
9.11.1 Performance under Acoustic Noise 337
9.11.2 Performance under Channel Errors 345
9.11.3 Performance Improvement under Channel Errors 349
9.12 Summary 350
Bibliography 351
10 Voice Activity Detection 357
10.1 Introduction 357
10.2 Standard VAD Methods 360
10.2.1 ITU-T G.729B/G.723.1A VAD 361
10.2.2 ETSI GSM-FR/HR/EFR VAD 361
10.2.3 ETSI AMR VAD 362
10.2.4 TIA/EIA IS-127/733 VAD 363
10.2.5 Performance Comparison of VADs 364
10.3 Likelihood-Ratio-Based VAD 368
10.3.1 Analysis and Improvement of the Likelihood Ratio Method 370
10.3.2 Noise Estimation Based on SLR 373
10.3.3 Comparison 373
10.4 Summary 375
Bibliography 375
11 Speech Enhancement 379
11.1 Introduction 379
11.2 Review of STSA-based Speech Enhancement 381
11.2.1 Spectral Subtraction 382
11.2.2 Maximum-likelihood Spectral Amplitude Estimation 384
11.2.3 Wiener Filtering 385
11.2.4 MMSE Spectral Amplitude Estimation 386
11.2.5 Spectral Estimation Based on the Uncertainty of Speech Presence 387
11.2.6 Comparisons 389
11.2.7 Discussion 392
11.3 Noise Adaptation 402
11.3.1 Hard Decision-based Noise Adaptation 402
11.3.2 Soft Decision-based Noise Adaptation 403
11.3.3 Mixed Decision-based Noise Adaptation 403
11.3.4 Comparisons 404
11.4 Echo Cancellation 406
11.4.1 Digital Echo Canceller Set-up 411
11.4.2 Echo Cancellation Formulation 413
11.4.3 Improved Performance Echo Cancellation 415
11.5 Summary 423
Bibliography 426
Index 429
Acknowledgements xv
1 Introduction 1
2 Coding Strategies and Standards 5
2.1 Introduction 5
2.2 Speech Coding Techniques 6
2.2.1 Parametric Coders 7
2.2.2 Waveform-approximating Coders 8
2.2.3 Hybrid Coding of Speech 8
2.3 Algorithm Objectives and Requirements 9
2.3.1 Quality and Capacity 9
2.3.2 Coding Delay 10
2.3.3 Channel and Background Noise Robustness 10
2.3.4 Complexity and Cost 11
2.3.5 Tandem Connection and Transcoding 11
2.3.6 Voiceband Data Handling 11
2.4 Standard Speech Coders 12
2.4.1 ITU-T Speech Coding Standard 12
2.4.2 European Digital Cellular Telephony Standards 13
2.4.3 North American Digital Cellular Telephony Standards 14
2.4.4 Secure Communication Telephony 14
2.4.5 Satellite Telephony 15
2.4.6 Selection of a Speech Coder 15
2.5 Summary 18
Bibliography 18
3 Sampling and Quantization 23
3.1 Introduction 23
3.2 Sampling 23
3.3 Scalar Quantization 26
3.3.1 Quantization Error 27
3.3.2 Uniform Quantizer 28
3.3.3 Optimum Quantizer 29
3.3.4 Logarithmic Quantizer 32
3.3.5 Adaptive Quantizer 33
3.3.6 Differential Quantizer 36
3.4 Vector Quantization 39
3.4.1 Distortion Measures 42
3.4.2 Codebook Design 43
3.4.3 Codebook Types 44
3.4.4 Training, Testing and Codebook Robustness 52
3.5 Summary 54
Bibliography 54
4 Speech Signal Analysis and Modelling 57
4.1 Introduction 57
4.2 Short-Time Spectral Analysis 57
4.2.1 Role of Windows 58
4.3 Linear Predictive Modelling of Speech Signals 65
4.3.1 Source Filter Model of Speech Production 65
4.3.2 Solutions to LPC Analysis 67
4.3.3 Practical Implementation of the LPC Analysis 74
4.4 Pitch Prediction 77
4.4.1 Periodicity in Speech Signals 77
4.4.2 Pitch Predictor (Filter) Formulation 78
4.5 Summary 84
Bibliography 84
5 Efficient LPC Quantization Methods 87
5.1 Introduction 87
5.2 Alternative Representation of LPC 87
5.3 LPC to LSF Transformation 90
5.3.1 Complex Root Method 95
5.3.2 Real Root Method 95
5.3.3 Ratio Filter Method 98
5.3.4 Chebyshev Series Method 100
5.3.5 Adaptive Sequential LMS Method 100
5.4 LSF to LPC Transformation 101
5.4.1 Direct Expansion Method 101
5.4.2 LPC Synthesis Filter Method 102
5.5 Properties of LSFs 103
5.6 LSF Quantization 105
5.6.1 Distortion Measures 106
5.6.2 Spectral Distortion 106
5.6.3 Average Spectral Distortion and Outliers 107
5.6.4 MSE Weighting Techniques 107
5.7 Codebook Structures 110
5.7.1 Split Vector Quantization 111
5.7.2 Multi-Stage Vector Quantization 113
5.7.3 Search strategies for MSVQ 114
5.7.4 MSVQ Codebook Training 116
5.8 MSVQ Performance Analysis 117
5.8.1 Codebook Structures 117
5.8.2 Search Techniques 117
5.8.3 Perceptual Weighting Techniques 119
5.9 Inter-frame Correlation 121
5.9.1 LSF Prediction 122
5.9.2 Prediction Order 124
5.9.3 Prediction Factor Estimation 125
5.9.4 Performance Evaluation of MA Prediction 126
5.9.5 Joint Quantization of LSFs 128
5.9.6 Use of MA Prediction in Joint Quantization 129
5.10 Improved LSF Estimation Through Anti-Aliasing Filtering 130
5.10.1 LSF Extraction 131
5.10.2 Advantages of Low-pass Filtering in Moving Average Prediction 135
5.11 Summary 146
Bibliography 146
6 Pitch Estimation and Voiced-Unvoiced Classification of Speech 149
6.1 Introduction 149
6.2 Pitch Estimation Methods 150
6.2.1 Time-Domain PDAs 151
6.2.2 Frequency-Domain PDAs 155
6.2.3 Time- and Frequency-Domain PDAs 158
6.2.4 Pre- and Post-processing Techniques 166
6.3 Voiced-Unvoiced Classification 178
6.3.1 Hard-Decision Voicing 178
6.3.2 Soft-Decision Voicing 189
6.4 Summary 196
Bibliography 197
7 Analysis by Synthesis LPC Coding 199
7.1 Introduction 199
7.2 Generalized AbS Coding 200
7.2.1 Time-Varying Filters 202
7.2.2 Perceptually-based Minimization Procedure 203
7.2.3 Excitation Signal 206
7.2.4 Determination of Optimum Excitation Sequence 208
7.2.5 Characteristics of AbS-LPC Schemes 212
7.3 Code-Excited Linear Predictive Coding 219
7.3.1 LPC Prediction 221
7.3.2 Pitch Prediction 222
7.3.3 Multi-Pulse Excitation 230
7.3.4 Codebook Excitation 238
7.3.5 Joint LTP and Codebook Excitation Computation 252
7.3.6 CELP with Post-Filtering 255
7.4 Summary 258
Bibliography 258
8 Harmonic Speech Coding 261
8.1 Introduction 261
8.2 Sinusoidal Analysis and Synthesis 262
8.3 Parameter Estimation 263
8.3.1 Voicing Determination 264
8.3.2 Harmonic Amplitude Estimation 266
8.4 Common Harmonic Coders 268
8.4.1 Sinusoidal Transform Coding 268
8.4.2 Improved Multi-Band Excitation, INMARSAT-M Version 270
8.4.3 Split-Band Linear Predictive Coding 271
8.5 Summary 275
Bibliography 275
9 Multimode Speech Coding 277
9.1 Introduction 277
9.2 Design Challenges of a Hybrid Coder 280
9.2.1 Reliable Speech Classification 281
9.2.2 Phase Synchronization 281
9.3 Summary of Hybrid Coders 281
9.3.1 Prototype Waveform Interpolation Coder 282
9.3.2 Combined Harmonic and Waveform Coding at Low Bit-Rates 282
9.3.3 A 4 kb/s Hybrid MELP/CELP Coder 283
9.3.4 Limitations of Existing Hybrid Coders 284
9.4 Synchronized Waveform-Matched Phase Model 285
9.4.1 Extraction of the Pitch Pulse Location 286
9.4.2 Estimation of the Pitch Pulse Shape 292
9.4.3 Synthesis using Generalized Cubic Phase Interpolation 297
9.5 Hybrid Encoder 298
9.5.1 Synchronized Harmonic Excitation 299
9.5.2 Advantages and Disadvantages of SWPM 301
9.5.3 Offset Target Modification 304
9.5.4 Onset Harmonic Memory Initialization 308
9.5.5 White Noise Excitation 309
9.6 Speech Classification 311
9.6.1 Open-Loop Initial Classification 312
9.6.2 Closed-Loop Transition Detection 315
9.6.3 Plosive Detection 318
9.7 Hybrid Decoder 319
9.8 Performance Evaluation 320
9.9 Quantization Issues of Hybrid Coder Parameters 322
9.9.1 Introduction 322
9.9.2 Unvoiced Excitation Quantization 323
9.9.3 Harmonic Excitation Quantization 323
9.9.4 Quantization of ACELP Excitation at Transitions 331
9.10 Variable Bit Rate Coding 331
9.10.1 Transition Quantization with 4 kb/s ACELP 332
9.10.2 Transition Quantization with 6 kb/s ACELP 332
9.10.3 Transition Quantization with 8 kb/s ACELP 333
9.10.4 Comparison 334
9.11 Acoustic Noise and Channel Error Performance 336
9.11.1 Performance under Acoustic Noise 337
9.11.2 Performance under Channel Errors 345
9.11.3 Performance Improvement under Channel Errors 349
9.12 Summary 350
Bibliography 351
10 Voice Activity Detection 357
10.1 Introduction 357
10.2 Standard VAD Methods 360
10.2.1 ITU-T G.729B/G.723.1A VAD 361
10.2.2 ETSI GSM-FR/HR/EFR VAD 361
10.2.3 ETSI AMR VAD 362
10.2.4 TIA/EIA IS-127/733 VAD 363
10.2.5 Performance Comparison of VADs 364
10.3 Likelihood-Ratio-Based VAD 368
10.3.1 Analysis and Improvement of the Likelihood Ratio Method 370
10.3.2 Noise Estimation Based on SLR 373
10.3.3 Comparison 373
10.4 Summary 375
Bibliography 375
11 Speech Enhancement 379
11.1 Introduction 379
11.2 Review of STSA-based Speech Enhancement 381
11.2.1 Spectral Subtraction 382
11.2.2 Maximum-likelihood Spectral Amplitude Estimation 384
11.2.3 Wiener Filtering 385
11.2.4 MMSE Spectral Amplitude Estimation 386
11.2.5 Spectral Estimation Based on the Uncertainty of Speech Presence 387
11.2.6 Comparisons 389
11.2.7 Discussion 392
11.3 Noise Adaptation 402
11.3.1 Hard Decision-based Noise Adaptation 402
11.3.2 Soft Decision-based Noise Adaptation 403
11.3.3 Mixed Decision-based Noise Adaptation 403
11.3.4 Comparisons 404
11.4 Echo Cancellation 406
11.4.1 Digital Echo Canceller Set-up 411
11.4.2 Echo Cancellation Formulation 413
11.4.3 Improved Performance Echo Cancellation 415
11.5 Summary 423
Bibliography 426
Index 429
Preface xiii
Acknowledgements xv
1 Introduction 1
2 Coding Strategies and Standards 5
2.1 Introduction 5
2.2 Speech Coding Techniques 6
2.2.1 Parametric Coders 7
2.2.2 Waveform-approximating Coders 8
2.2.3 Hybrid Coding of Speech 8
2.3 Algorithm Objectives and Requirements 9
2.3.1 Quality and Capacity 9
2.3.2 Coding Delay 10
2.3.3 Channel and Background Noise Robustness 10
2.3.4 Complexity and Cost 11
2.3.5 Tandem Connection and Transcoding 11
2.3.6 Voiceband Data Handling 11
2.4 Standard Speech Coders 12
2.4.1 ITU-T Speech Coding Standard 12
2.4.2 European Digital Cellular Telephony Standards 13
2.4.3 North American Digital Cellular Telephony Standards 14
2.4.4 Secure Communication Telephony 14
2.4.5 Satellite Telephony 15
2.4.6 Selection of a Speech Coder 15
2.5 Summary 18
Bibliography 18
3 Sampling and Quantization 23
3.1 Introduction 23
3.2 Sampling 23
3.3 Scalar Quantization 26
3.3.1 Quantization Error 27
3.3.2 Uniform Quantizer 28
3.3.3 Optimum Quantizer 29
3.3.4 Logarithmic Quantizer 32
3.3.5 Adaptive Quantizer 33
3.3.6 Differential Quantizer 36
3.4 Vector Quantization 39
3.4.1 Distortion Measures 42
3.4.2 Codebook Design 43
3.4.3 Codebook Types 44
3.4.4 Training, Testing and Codebook Robustness 52
3.5 Summary 54
Bibliography 54
4 Speech Signal Analysis and Modelling 57
4.1 Introduction 57
4.2 Short-Time Spectral Analysis 57
4.2.1 Role of Windows 58
4.3 Linear Predictive Modelling of Speech Signals 65
4.3.1 Source Filter Model of Speech Production 65
4.3.2 Solutions to LPC Analysis 67
4.3.3 Practical Implementation of the LPC Analysis 74
4.4 Pitch Prediction 77
4.4.1 Periodicity in Speech Signals 77
4.4.2 Pitch Predictor (Filter) Formulation 78
4.5 Summary 84
Bibliography 84
5 Efficient LPC Quantization Methods 87
5.1 Introduction 87
5.2 Alternative Representation of LPC 87
5.3 LPC to LSF Transformation 90
5.3.1 Complex Root Method 95
5.3.2 Real Root Method 95
5.3.3 Ratio Filter Method 98
5.3.4 Chebyshev Series Method 100
5.3.5 Adaptive Sequential LMS Method 100
5.4 LSF to LPC Transformation 101
5.4.1 Direct Expansion Method 101
5.4.2 LPC Synthesis Filter Method 102
5.5 Properties of LSFs 103
5.6 LSF Quantization 105
5.6.1 Distortion Measures 106
5.6.2 Spectral Distortion 106
5.6.3 Average Spectral Distortion and Outliers 107
5.6.4 MSE Weighting Techniques 107
5.7 Codebook Structures 110
5.7.1 Split Vector Quantization 111
5.7.2 Multi-Stage Vector Quantization 113
5.7.3 Search strategies for MSVQ 114
5.7.4 MSVQ Codebook Training 116
5.8 MSVQ Performance Analysis 117
5.8.1 Codebook Structures 117
5.8.2 Search Techniques 117
5.8.3 Perceptual Weighting Techniques 119
5.9 Inter-frame Correlation 121
5.9.1 LSF Prediction 122
5.9.2 Prediction Order 124
5.9.3 Prediction Factor Estimation 125
5.9.4 Performance Evaluation of MA Prediction 126
5.9.5 Joint Quantization of LSFs 128
5.9.6 Use of MA Prediction in Joint Quantization 129
5.10 Improved LSF Estimation Through Anti-Aliasing Filtering 130
5.10.1 LSF Extraction 131
5.10.2 Advantages of Low-pass Filtering in Moving Average Prediction 135
5.11 Summary 146
Bibliography 146
6 Pitch Estimation and Voiced-Unvoiced Classification of Speech 149
6.1 Introduction 149
6.2 Pitch Estimation Methods 150
6.2.1 Time-Domain PDAs 151
6.2.2 Frequency-Domain PDAs 155
6.2.3 Time- and Frequency-Domain PDAs 158
6.2.4 Pre- and Post-processing Techniques 166
6.3 Voiced-Unvoiced Classification 178
6.3.1 Hard-Decision Voicing 178
6.3.2 Soft-Decision Voicing 189
6.4 Summary 196
Bibliography 197
7 Analysis by Synthesis LPC Coding 199
7.1 Introduction 199
7.2 Generalized AbS Coding 200
7.2.1 Time-Varying Filters 202
7.2.2 Perceptually-based Minimization Procedure 203
7.2.3 Excitation Signal 206
7.2.4 Determination of Optimum Excitation Sequence 208
7.2.5 Characteristics of AbS-LPC Schemes 212
7.3 Code-Excited Linear Predictive Coding 219
7.3.1 LPC Prediction 221
7.3.2 Pitch Prediction 222
7.3.3 Multi-Pulse Excitation 230
7.3.4 Codebook Excitation 238
7.3.5 Joint LTP and Codebook Excitation Computation 252
7.3.6 CELP with Post-Filtering 255
7.4 Summary 258
Bibliography 258
8 Harmonic Speech Coding 261
8.1 Introduction 261
8.2 Sinusoidal Analysis and Synthesis 262
8.3 Parameter Estimation 263
8.3.1 Voicing Determination 264
8.3.2 Harmonic Amplitude Estimation 266
8.4 Common Harmonic Coders 268
8.4.1 Sinusoidal Transform Coding 268
8.4.2 Improved Multi-Band Excitation, INMARSAT-M Version 270
8.4.3 Split-Band Linear Predictive Coding 271
8.5 Summary 275
Bibliography 275
9 Multimode Speech Coding 277
9.1 Introduction 277
9.2 Design Challenges of a Hybrid Coder 280
9.2.1 Reliable Speech Classification 281
9.2.2 Phase Synchronization 281
9.3 Summary of Hybrid Coders 281
9.3.1 Prototype Waveform Interpolation Coder 282
9.3.2 Combined Harmonic and Waveform Coding at Low Bit-Rates 282
9.3.3 A 4 kb/s Hybrid MELP/CELP Coder 283
9.3.4 Limitations of Existing Hybrid Coders 284
9.4 Synchronized Waveform-Matched Phase Model 285
9.4.1 Extraction of the Pitch Pulse Location 286
9.4.2 Estimation of the Pitch Pulse Shape 292
9.4.3 Synthesis using Generalized Cubic Phase Interpolation 297
9.5 Hybrid Encoder 298
9.5.1 Synchronized Harmonic Excitation 299
9.5.2 Advantages and Disadvantages of SWPM 301
9.5.3 Offset Target Modification 304
9.5.4 Onset Harmonic Memory Initialization 308
9.5.5 White Noise Excitation 309
9.6 Speech Classification 311
9.6.1 Open-Loop Initial Classification 312
9.6.2 Closed-Loop Transition Detection 315
9.6.3 Plosive Detection 318
9.7 Hybrid Decoder 319
9.8 Performance Evaluation 320
9.9 Quantization Issues of Hybrid Coder Parameters 322
9.9.1 Introduction 322
9.9.2 Unvoiced Excitation Quantization 323
9.9.3 Harmonic Excitation Quantization 323
9.9.4 Quantization of ACELP Excitation at Transitions 331
9.10 Variable Bit Rate Coding 331
9.10.1 Transition Quantization with 4 kb/s ACELP 332
9.10.2 Transition Quantization with 6 kb/s ACELP 332
9.10.3 Transition Quantization with 8 kb/s ACELP 333
9.10.4 Comparison 334
9.11 Acoustic Noise and Channel Error Performance 336
9.11.1 Performance under Acoustic Noise 337
9.11.2 Performance under Channel Errors 345
9.11.3 Performance Improvement under Channel Errors 349
9.12 Summary 350
Bibliography 351
10 Voice Activity Detection 357
10.1 Introduction 357
10.2 Standard VAD Methods 360
10.2.1 ITU-T G.729B/G.723.1A VAD 361
10.2.2 ETSI GSM-FR/HR/EFR VAD 361
10.2.3 ETSI AMR VAD 362
10.2.4 TIA/EIA IS-127/733 VAD 363
10.2.5 Performance Comparison of VADs 364
10.3 Likelihood-Ratio-Based VAD 368
10.3.1 Analysis and Improvement of the Likelihood Ratio Method 370
10.3.2 Noise Estimation Based on SLR 373
10.3.3 Comparison 373
10.4 Summary 375
Bibliography 375
11 Speech Enhancement 379
11.1 Introduction 379
11.2 Review of STSA-based Speech Enhancement 381
11.2.1 Spectral Subtraction 382
11.2.2 Maximum-likelihood Spectral Amplitude Estimation 384
11.2.3 Wiener Filtering 385
11.2.4 MMSE Spectral Amplitude Estimation 386
11.2.5 Spectral Estimation Based on the Uncertainty of Speech Presence 387
11.2.6 Comparisons 389
11.2.7 Discussion 392
11.3 Noise Adaptation 402
11.3.1 Hard Decision-based Noise Adaptation 402
11.3.2 Soft Decision-based Noise Adaptation 403
11.3.3 Mixed Decision-based Noise Adaptation 403
11.3.4 Comparisons 404
11.4 Echo Cancellation 406
11.4.1 Digital Echo Canceller Set-up 411
11.4.2 Echo Cancellation Formulation 413
11.4.3 Improved Performance Echo Cancellation 415
11.5 Summary 423
Bibliography 426
Index 429
Acknowledgements xv
1 Introduction 1
2 Coding Strategies and Standards 5
2.1 Introduction 5
2.2 Speech Coding Techniques 6
2.2.1 Parametric Coders 7
2.2.2 Waveform-approximating Coders 8
2.2.3 Hybrid Coding of Speech 8
2.3 Algorithm Objectives and Requirements 9
2.3.1 Quality and Capacity 9
2.3.2 Coding Delay 10
2.3.3 Channel and Background Noise Robustness 10
2.3.4 Complexity and Cost 11
2.3.5 Tandem Connection and Transcoding 11
2.3.6 Voiceband Data Handling 11
2.4 Standard Speech Coders 12
2.4.1 ITU-T Speech Coding Standard 12
2.4.2 European Digital Cellular Telephony Standards 13
2.4.3 North American Digital Cellular Telephony Standards 14
2.4.4 Secure Communication Telephony 14
2.4.5 Satellite Telephony 15
2.4.6 Selection of a Speech Coder 15
2.5 Summary 18
Bibliography 18
3 Sampling and Quantization 23
3.1 Introduction 23
3.2 Sampling 23
3.3 Scalar Quantization 26
3.3.1 Quantization Error 27
3.3.2 Uniform Quantizer 28
3.3.3 Optimum Quantizer 29
3.3.4 Logarithmic Quantizer 32
3.3.5 Adaptive Quantizer 33
3.3.6 Differential Quantizer 36
3.4 Vector Quantization 39
3.4.1 Distortion Measures 42
3.4.2 Codebook Design 43
3.4.3 Codebook Types 44
3.4.4 Training, Testing and Codebook Robustness 52
3.5 Summary 54
Bibliography 54
4 Speech Signal Analysis and Modelling 57
4.1 Introduction 57
4.2 Short-Time Spectral Analysis 57
4.2.1 Role of Windows 58
4.3 Linear Predictive Modelling of Speech Signals 65
4.3.1 Source Filter Model of Speech Production 65
4.3.2 Solutions to LPC Analysis 67
4.3.3 Practical Implementation of the LPC Analysis 74
4.4 Pitch Prediction 77
4.4.1 Periodicity in Speech Signals 77
4.4.2 Pitch Predictor (Filter) Formulation 78
4.5 Summary 84
Bibliography 84
5 Efficient LPC Quantization Methods 87
5.1 Introduction 87
5.2 Alternative Representation of LPC 87
5.3 LPC to LSF Transformation 90
5.3.1 Complex Root Method 95
5.3.2 Real Root Method 95
5.3.3 Ratio Filter Method 98
5.3.4 Chebyshev Series Method 100
5.3.5 Adaptive Sequential LMS Method 100
5.4 LSF to LPC Transformation 101
5.4.1 Direct Expansion Method 101
5.4.2 LPC Synthesis Filter Method 102
5.5 Properties of LSFs 103
5.6 LSF Quantization 105
5.6.1 Distortion Measures 106
5.6.2 Spectral Distortion 106
5.6.3 Average Spectral Distortion and Outliers 107
5.6.4 MSE Weighting Techniques 107
5.7 Codebook Structures 110
5.7.1 Split Vector Quantization 111
5.7.2 Multi-Stage Vector Quantization 113
5.7.3 Search strategies for MSVQ 114
5.7.4 MSVQ Codebook Training 116
5.8 MSVQ Performance Analysis 117
5.8.1 Codebook Structures 117
5.8.2 Search Techniques 117
5.8.3 Perceptual Weighting Techniques 119
5.9 Inter-frame Correlation 121
5.9.1 LSF Prediction 122
5.9.2 Prediction Order 124
5.9.3 Prediction Factor Estimation 125
5.9.4 Performance Evaluation of MA Prediction 126
5.9.5 Joint Quantization of LSFs 128
5.9.6 Use of MA Prediction in Joint Quantization 129
5.10 Improved LSF Estimation Through Anti-Aliasing Filtering 130
5.10.1 LSF Extraction 131
5.10.2 Advantages of Low-pass Filtering in Moving Average Prediction 135
5.11 Summary 146
Bibliography 146
6 Pitch Estimation and Voiced-Unvoiced Classification of Speech 149
6.1 Introduction 149
6.2 Pitch Estimation Methods 150
6.2.1 Time-Domain PDAs 151
6.2.2 Frequency-Domain PDAs 155
6.2.3 Time- and Frequency-Domain PDAs 158
6.2.4 Pre- and Post-processing Techniques 166
6.3 Voiced-Unvoiced Classification 178
6.3.1 Hard-Decision Voicing 178
6.3.2 Soft-Decision Voicing 189
6.4 Summary 196
Bibliography 197
7 Analysis by Synthesis LPC Coding 199
7.1 Introduction 199
7.2 Generalized AbS Coding 200
7.2.1 Time-Varying Filters 202
7.2.2 Perceptually-based Minimization Procedure 203
7.2.3 Excitation Signal 206
7.2.4 Determination of Optimum Excitation Sequence 208
7.2.5 Characteristics of AbS-LPC Schemes 212
7.3 Code-Excited Linear Predictive Coding 219
7.3.1 LPC Prediction 221
7.3.2 Pitch Prediction 222
7.3.3 Multi-Pulse Excitation 230
7.3.4 Codebook Excitation 238
7.3.5 Joint LTP and Codebook Excitation Computation 252
7.3.6 CELP with Post-Filtering 255
7.4 Summary 258
Bibliography 258
8 Harmonic Speech Coding 261
8.1 Introduction 261
8.2 Sinusoidal Analysis and Synthesis 262
8.3 Parameter Estimation 263
8.3.1 Voicing Determination 264
8.3.2 Harmonic Amplitude Estimation 266
8.4 Common Harmonic Coders 268
8.4.1 Sinusoidal Transform Coding 268
8.4.2 Improved Multi-Band Excitation, INMARSAT-M Version 270
8.4.3 Split-Band Linear Predictive Coding 271
8.5 Summary 275
Bibliography 275
9 Multimode Speech Coding 277
9.1 Introduction 277
9.2 Design Challenges of a Hybrid Coder 280
9.2.1 Reliable Speech Classification 281
9.2.2 Phase Synchronization 281
9.3 Summary of Hybrid Coders 281
9.3.1 Prototype Waveform Interpolation Coder 282
9.3.2 Combined Harmonic and Waveform Coding at Low Bit-Rates 282
9.3.3 A 4 kb/s Hybrid MELP/CELP Coder 283
9.3.4 Limitations of Existing Hybrid Coders 284
9.4 Synchronized Waveform-Matched Phase Model 285
9.4.1 Extraction of the Pitch Pulse Location 286
9.4.2 Estimation of the Pitch Pulse Shape 292
9.4.3 Synthesis using Generalized Cubic Phase Interpolation 297
9.5 Hybrid Encoder 298
9.5.1 Synchronized Harmonic Excitation 299
9.5.2 Advantages and Disadvantages of SWPM 301
9.5.3 Offset Target Modification 304
9.5.4 Onset Harmonic Memory Initialization 308
9.5.5 White Noise Excitation 309
9.6 Speech Classification 311
9.6.1 Open-Loop Initial Classification 312
9.6.2 Closed-Loop Transition Detection 315
9.6.3 Plosive Detection 318
9.7 Hybrid Decoder 319
9.8 Performance Evaluation 320
9.9 Quantization Issues of Hybrid Coder Parameters 322
9.9.1 Introduction 322
9.9.2 Unvoiced Excitation Quantization 323
9.9.3 Harmonic Excitation Quantization 323
9.9.4 Quantization of ACELP Excitation at Transitions 331
9.10 Variable Bit Rate Coding 331
9.10.1 Transition Quantization with 4 kb/s ACELP 332
9.10.2 Transition Quantization with 6 kb/s ACELP 332
9.10.3 Transition Quantization with 8 kb/s ACELP 333
9.10.4 Comparison 334
9.11 Acoustic Noise and Channel Error Performance 336
9.11.1 Performance under Acoustic Noise 337
9.11.2 Performance under Channel Errors 345
9.11.3 Performance Improvement under Channel Errors 349
9.12 Summary 350
Bibliography 351
10 Voice Activity Detection 357
10.1 Introduction 357
10.2 Standard VAD Methods 360
10.2.1 ITU-T G.729B/G.723.1A VAD 361
10.2.2 ETSI GSM-FR/HR/EFR VAD 361
10.2.3 ETSI AMR VAD 362
10.2.4 TIA/EIA IS-127/733 VAD 363
10.2.5 Performance Comparison of VADs 364
10.3 Likelihood-Ratio-Based VAD 368
10.3.1 Analysis and Improvement of the Likelihood Ratio Method 370
10.3.2 Noise Estimation Based on SLR 373
10.3.3 Comparison 373
10.4 Summary 375
Bibliography 375
11 Speech Enhancement 379
11.1 Introduction 379
11.2 Review of STSA-based Speech Enhancement 381
11.2.1 Spectral Subtraction 382
11.2.2 Maximum-likelihood Spectral Amplitude Estimation 384
11.2.3 Wiener Filtering 385
11.2.4 MMSE Spectral Amplitude Estimation 386
11.2.5 Spectral Estimation Based on the Uncertainty of Speech Presence 387
11.2.6 Comparisons 389
11.2.7 Discussion 392
11.3 Noise Adaptation 402
11.3.1 Hard Decision-based Noise Adaptation 402
11.3.2 Soft Decision-based Noise Adaptation 403
11.3.3 Mixed Decision-based Noise Adaptation 403
11.3.4 Comparisons 404
11.4 Echo Cancellation 406
11.4.1 Digital Echo Canceller Set-up 411
11.4.2 Echo Cancellation Formulation 413
11.4.3 Improved Performance Echo Cancellation 415
11.5 Summary 423
Bibliography 426
Index 429