Genome sequencing have identified many predicted
proteins with
unknown biological functions. Protein interaction
networks link
proteins/genes together, provide a global context for
interacting
proteins, and enable studies at the systems level.
High throughput
methods were developed to detect protein-protein
interactions.
However, current techniques show high levels of false
positives and
false negatives. They are also labor intensive and
expensive to
perform. Computational methods are needed to reduce
testing
space and enhance testing efficiency. In this work, a
two-stage
statistical scoring model was built to assign
confidence scores to
fruit fly yeast two-hybrid interactions. High
confidence predictions
are significantly enriched with biologically
meaningful interactions.
Cross-validation showed good prediction performance,
which were
also validated by independent data sources. We
obtained 24798 new
interactions involving 4702 proteins. The work should
be especially
interesting to researchers in interactome or systems
biology fields.
It also appeals to professionals practicing machine
learning or data
mining in computational biology and bioinformatics.
proteins with
unknown biological functions. Protein interaction
networks link
proteins/genes together, provide a global context for
interacting
proteins, and enable studies at the systems level.
High throughput
methods were developed to detect protein-protein
interactions.
However, current techniques show high levels of false
positives and
false negatives. They are also labor intensive and
expensive to
perform. Computational methods are needed to reduce
testing
space and enhance testing efficiency. In this work, a
two-stage
statistical scoring model was built to assign
confidence scores to
fruit fly yeast two-hybrid interactions. High
confidence predictions
are significantly enriched with biologically
meaningful interactions.
Cross-validation showed good prediction performance,
which were
also validated by independent data sources. We
obtained 24798 new
interactions involving 4702 proteins. The work should
be especially
interesting to researchers in interactome or systems
biology fields.
It also appeals to professionals practicing machine
learning or data
mining in computational biology and bioinformatics.