Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish

M Hiller, S Agarwal, JH Notwell, R Parikh… - Nucleic acids …, 2013 - academic.oup.com
M Hiller, S Agarwal, JH Notwell, R Parikh, H Guturu, AM Wenger, G Bejerano
Nucleic acids research, 2013academic.oup.com
Many important model organisms for biomedical and evolutionary research have sequenced
genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other
sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate
model for cis-regulation, development and human disease, whose evolutionary distance to
all other currently sequenced fish exceeds the distance between human and chicken. Such
large distances make it difficult to align genomes and use them for comparative analysis …
Abstract
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis -regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis -regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs ( http://zebrafish.stanford.edu ) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.
Oxford University Press