Predictive Systems for Reasoning About Biological Data
Overview

One recurring theme in current computational biology research is the need to build initially simplified models of biological systems that allow researchers to extract meaningful information from otherwise hopelessly noisy data. In this project I propose a systematic computational framework for predictive reasoning about biological data that relies on Bayesian networks.
Comparative Genomic Annotation

Annotation by Combination of Experts

(Bioinformatics, Vol. 18, No. 1, pp. 19-27, 2002)

Computational research on genomic annotation has resulted in several successfully used systems such as Genscan, HMMer, HMMGene, etc. As the number of such systems grows the need for a rigorous way to combine the predictions becomes more essential. We propose a Bayesian network framework for combining gene predictions from multiple systems. The framework allows us to treat the problem as combining the advice of
multiple experts. Previous work in the area used relatively simple ideas such as majority voting. We introduce the use of Hidden Input/Output Markov models for combining gene predictions. The framework was applied to the analysis of the Adh region in Drosophila and we obtained excellent results (over 25% improved prediction on exon level!)

  OR AND SNB SFB OHMM IHMM fSNB fSFB fIHMM
Sn 94.90 70.71 94.40 93.35 95.17 33.33 94.27 90.26 11.67
Sp 50.13 89.49 83.66 83.16 50.86 67.38 79.16 83.15 19.75
Me 4.00 22.55 4.46 5.21 4.01 7.88 4.46 7.12 15.16
WE 39.83 6.97 12.85 12.82 39.65 7.03 16.10 12.05 9.68
ME+WE 43.84 29.52 17.31 18.02 43.66 14.92 20.56 19.17 24.84

Joint work with Simon Kasif (BU) and Ashutosh Garg (UIUC).  Part of this work was done at Compaq CRL.

Comparative Prediction of Translation Initiation Codons

(Nucleic Acids Research, Vol. 30, No. 14, 2002 Abstract, Full text )

Comparative modeling can be successfully used for homolgy-based genomic annotation. We propose a new probabilistic comparative method to improve the accuracy of gene identification systems by finding precise translation start sites that are notoriously difficult to predict. The method employs a novel architecture in the form of a product hidden Markov model (PHMM) that jointly models statistics of pairs of orthologous DNA sequences. Using this methodology we are able to significantly improve prediction of prokaryotic splice sites and beat two traditional methods (based on TLBASTX and TBLASTN). 

Offsets (errors) in prediction of transcription initiation sites for Pyrococcus Abyssi (using comparison with P. Horikoshii) using TBLASTX and PHMM.

Joint work with Megon Walker (BU) and Simon Kasif (BU).

Comparative Human-Mouse Annotation

(Work in progress.)

Identification and prediction of genes in the human genome can be brought to a new level using comparative analysis with genomes of other related species.  We consider a new model of comparative gene annotation using an evidence-integration framework.  Unlike the PHMM, this framework focuses on annotation of one genome at a time using a variety of information sources, among them a measure of similarity with the mouse genome.  We are developing a full eukaryotic gene finder that does exactly that!

Joint work with Lingang Zhang (BU) and Simon Kasif (BU).

 

Functional Annotation

Functional Annotation by Combination of Evidence

(Work in progress - more information will be available soon.)

Publications

Presentation at Snowbird Learning Workshop, April 2-5, 2002. Click here for a PDF of this poster.

rule
Home Page  |  Profile  |  Publications  |  Curriculum Vitae
rule

Comments  |  06 February 2002