Annotation by Combination of Experts
(Bioinformatics, Vol. 18, No. 1, pp. 19-27, 2002)
Computational research on genomic annotation has resulted in several
successfully used systems such as Genscan, HMMer, HMMGene, etc. As the number of such systems grows the
need for a rigorous way to combine the predictions becomes more
essential. We propose a Bayesian network framework for combining gene predictions from multiple systems.
The framework allows us to treat the problem as combining the advice of
multiple experts. Previous work in the area used relatively simple ideas such as majority voting. We
introduce the use of Hidden Input/Output Markov models for combining gene predictions.
The framework was applied to the analysis of the Adh region in Drosophila
and we obtained excellent results (over 25% improved prediction on exon
level!)
| |
OR |
AND |
SNB |
SFB |
OHMM |
IHMM |
fSNB |
fSFB |
fIHMM |
| Sn |
94.90 |
70.71 |
94.40 |
93.35 |
95.17 |
33.33 |
94.27 |
90.26 |
11.67 |
| Sp |
50.13 |
89.49 |
83.66 |
83.16 |
50.86 |
67.38 |
79.16 |
83.15 |
19.75 |
| Me |
4.00 |
22.55 |
4.46 |
5.21 |
4.01 |
7.88 |
4.46 |
7.12 |
15.16 |
| WE |
39.83 |
6.97 |
12.85 |
12.82 |
39.65 |
7.03 |
16.10 |
12.05 |
9.68 |
| ME+WE |
43.84 |
29.52 |
17.31 |
18.02 |
43.66 |
14.92 |
20.56 |
19.17 |
24.84 |
Joint work with Simon Kasif (BU) and Ashutosh Garg (UIUC). Part
of this work was done at Compaq CRL.
Comparative Prediction of Translation Initiation Codons
(Nucleic Acids Research, Vol. 30, No. 14, 2002
Abstract,
Full text
)
Comparative modeling can be successfully used for homolgy-based genomic
annotation. We propose a new probabilistic comparative method to improve the accuracy of gene identification systems by finding precise
translation start sites that are notoriously difficult to predict. The
method employs a novel architecture in the form of a product hidden Markov
model (PHMM) that jointly models statistics of pairs of orthologous DNA sequences.
Using this methodology we are able to significantly improve prediction
of prokaryotic splice sites and beat two traditional methods (based on
TLBASTX and TBLASTN).
Offsets (errors) in prediction of transcription
initiation sites for Pyrococcus Abyssi (using comparison with P.
Horikoshii) using TBLASTX and PHMM.
Joint work with Megon Walker (BU) and Simon Kasif (BU).
Comparative Human-Mouse Annotation
(Work in progress.)
Identification and prediction of genes in the human genome can be
brought to a new level using comparative analysis with genomes of other
related species. We consider a new model of comparative gene
annotation using an evidence-integration framework. Unlike the
PHMM, this framework focuses on annotation of one genome at a time using
a variety of information sources, among them a measure of similarity
with the mouse genome. We are developing a full eukaryotic gene
finder that does exactly that!
Joint work with Lingang Zhang (BU) and Simon Kasif (BU).
|