Pattern matching plays a central role in different
aspects of biological sequence analysis, and has been
used in applications as diverse as short-gun
sequencing, multiple sequence alignments, gene
finding, analysis of repetition structures, searching
for unique oligonucleotides, prediction of protein
function and structure, sequence homology search,
finding DNA-binding protein motifs, etc. Suffix trees
and suffix arrays are primary data structures used in
rapid pattern matching. And the problem of suffix
sorting is a fundamental problem in constructing
suffix arrays. This book presents the basics,
concepts of suffix trees, suffix arrays and suffix
sorting and proposes a direct suffix sorting
algorithm which rearranges the biological sequences
of interests and facilitate high throughput pattern
query, retrieval and storage in linear time. The
direct suffix sorting algorithm is then applied to
solve practical problems in multiple sequence
alignment and data compression. The book serves both
as reference for computer scientists, computational
biologists and bio-informatic professionals and an
essential study materials for graduate and advanced
courses on computational biology.
aspects of biological sequence analysis, and has been
used in applications as diverse as short-gun
sequencing, multiple sequence alignments, gene
finding, analysis of repetition structures, searching
for unique oligonucleotides, prediction of protein
function and structure, sequence homology search,
finding DNA-binding protein motifs, etc. Suffix trees
and suffix arrays are primary data structures used in
rapid pattern matching. And the problem of suffix
sorting is a fundamental problem in constructing
suffix arrays. This book presents the basics,
concepts of suffix trees, suffix arrays and suffix
sorting and proposes a direct suffix sorting
algorithm which rearranges the biological sequences
of interests and facilitate high throughput pattern
query, retrieval and storage in linear time. The
direct suffix sorting algorithm is then applied to
solve practical problems in multiple sequence
alignment and data compression. The book serves both
as reference for computer scientists, computational
biologists and bio-informatic professionals and an
essential study materials for graduate and advanced
courses on computational biology.