|
|
|
|
Published online before print
December 8, 2004, 10.1101/gr.3007205 Genome Res. 15:184-194, 2005 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05 $5.00
Chicken Special/Resource Mulan: Multiple-sequence local alignment and visualization for studying function and evolution1 Energy, Environment, Biology and Institutional Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, USA 2 Genome Biology Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA 3 Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 4 Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA 5 Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
Multiple-sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes, and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/
A significant growth in sequencing the genomes of complex organisms, including the recent completion of the chicken genome, opens new horizons in the field of comparative genomics and compels improvements on current tools and methodologies devoted to the identification of functional regions in multiple sequence alignments. It has now been well established that blocks of evolutionary conservation identified by cross-species comparative analysis correlate with functionally important DNA regions such as protein-coding genes (Pennacchio et al. 2001
Several available Web-based tools implement multiple-sequence analysis either as a series of pairwise alignments with a selected reference sequence (Mayor et al. 2000
Here we report a new integrative comparative tool, Mulan, that dynamically and rapidly generates multiple-sequence local alignments (MSLAs), and we present several examples for the application of this tool to study phenotypic differences in vertebrate species. The Mulan alignment engine consists of several data analysis and visualization schemes for high-throughput identification of functional coding and noncoding elements conserved across large evolutionary distances. Mulan (1) determines phylogenetic relationships among the input sequences and generates phylogenetic trees, (2) constructs graphical and textual alignments, (3) dynamically detects evolutionary conserved regions (ECRs) in alignments, and (4) presents users with several visual display options for the generated conservation profiles. This tool is also able to implement the phylogenetic shadowing strategy for identifying slow-mutating elements in comparisons of multiple closely related species (Ovcharenko et al. 2004a
Alignment strategy Mulan employs two alignment strategies that allow for comparative analysis of multiple sequences that are present either as (1) draft or (2) finished configuration. The first approach allows for the construction of an alignment for multiple draft-quality sequences and subsequently for effective order-and-orientation (O&O) of unfinished sequences based on the reference genome. The second approach operates with multiple high-quality single-contig (finished) sequences, and is the main subject of this paper.
Genomic sequences submitted to Mulan are aligned by the threaded blockset aligner (TBA) program for finished sequences and by the refine program for draft sequences (Blanchette et al. 2004
Mulan alignment visualization is based on the zPicture display design (Ovcharenko et al. 2004b
Visualization and data analysis strategies for multiple-sequence local alignments Multiple-sequence comparative analysis is a challenging task in terms of generating highly reliable alignments and graphically displaying the alignment results. To address the complexity stemming from user input sequence files that potentially consist of a large number of sequences of varying lengths and different phylogenetic relationships, we provide a set of different visualization options applicable to any finished MSLA. For example, the reference sequence can be dynamically changed, and the new stacking order of conservation profiles with the rest of the species will be automatically determined using the evolutionary relationship of each sequence to the reference sequence, where more closely related species are at the bottom.
"Color density by interspecies conservation" illustrates a relationship between the color density of a conserved element and the number of species that share a particular region (Fig. 2) such that, the more species share a sequence, the darker the conservation profile will be displayed. (This analysis is performed for every pixel-wide region of the conservation plot. The number of ECRs from different species that overlap with a particular pixel count towards the number of species sharing this region.) In a recent study, it was observed that regions conserved in multiple species often correlate with functional elements (Frazer et al. 2004
Two additional data representation modules are implemented in the Mulan tool: phylogenetic shadowing and "summary of conservation." While "summary of conservation" collects all the shared nucleotide similarities from all the pairwise comparisons into a single conservation profile, the phylogenetic shadowing option effectively collects all the cumulative nucleotide mismatches (Ovcharenko et al. 2004a
Evaluation of alignment tools To evaluate and compare the performance of the refine and TBA programsthe tools underlying the draft and finished Mulan alignment scheme, respectivelywe followed the approach of Blanchette et al. (2004 To be consistent in comparing aligners, the same BLASTZ parameters (C = 0, Y = 3400, K = 2000) were used for all data sets. TBA uses the guiding tree of (((human chimp) baboon)(rat mouse)) ((cow pig)(cat dog))). Refine uses human as the reference sequence. The performance of aligners with respect to the agreement score is illustrated in Figure 4. Only representative pairs are shown, to illustrate how performance of an aligner varies with evolutionary distance.
Several observations can be made about the graph in Figure 4. First, for sequences at very short evolutionary distance, such as human versus chimp and human versus baboon, all methods work well. Second, refine performs as well as or a little better than BLASTZ alone for pairs containing the reference sequence, for example human versus mouse and human versus dog. However, for sequences being pulled together by refine instead of direct pairwise alignment, the performance is worse, for example, rat versus mouse and cat versus dog. Third, TBA performs as well as or better than BLASTZ alone for all comparisons. For closely related species, TBA does not lose accuracy, while for distantly related species, TBA significantly improves accuracy (e.g., human vs. mouse). At the same time, TBA performs as well as or better than refine. TBA outperforms refine dramatically for cat versus dog and especially rat versus mouse. TBA builds alignments starting from leaves of the phylogenetic tree, utilizing the fact that pairwise alignment between two species with closer evolutionary relationship is more reliable than with distantly related species. For instance, TBA directly uses the ratmouse alignment, whereas refine aligns rat to mouse based on information about how the two align to a distant intermediary. For instance, a human region might align to mouse but not to rat (rat is evolving slightly faster than mouse), though the corresponding mouse and rat regions are easily aligned to each other; TBA will correctly match the human, mouse, and rat regions, but refine will match only human and mouse.
Fourth, the regions of disagreement in an alignment are composed of mismatches, unidentified alignments, and false alignments. By regarding mismatches within five base positions as correct matches, TBA_5 shows a substantial increase on agreement score. In other words, mismatches in an alignment produced by TBA are frequently very close to their correct match positions. For some analyses, close agreement with the true aligned position is adequate. Although the performance of TBA is better than refine for certain cases, the running time for TBA is much longer than refine. For aligning nine species each with a length of
From sequence evolution to genome biology
Multiple-sequence Mulan alignments identified all the coding exons of the GATA3 gene as conserved segments in all of the species, highlighting the functional importance of this protein and suggesting that interspecies differences associated with the GATA3 protein most likely originate from differences in noncoding sequences. This is supported by noncoding conservation patterns that significantly differ in comparison of the human sequence with different species (Fig. 5B). Three main groups of conservation were identified: human/rodents, amphibian/fish, and chicken. Five ECRs (ECR1ECR5) are shared by at least four different species (including human). One of them, the intronic ECR5, was present in all species, suggesting a key role of this element for the GATA3 locus. For example, it could be a general enhancer element responsible for the expression of this gene. Three other ECRs, upstream ECR1 and ECR2 and intronic ECR4, are shared only by humans, rodents, and chicken and are not detected in either frog or fish lineages (there are no remains of sequence conservation of these ECRs in indicated species that would be displayed as short lines on Fig. 5B otherwise), suggesting a putative differential expression of the GATA3 gene in these two groups of genomes as regulated by this subset of three ECRs. One could speculate that the key involvement of the GATA3 gene in the hair/feathers growth regulation pathway could be indeed regulated by one of these three ECRs, and that their absence from the frog and fish genomes may be responsible for the lack of hair in these species. More interesting is the conservation of the ECR3 element across multiple species. This element is present in all but the chicken genome. While the conservation with fish suggests functionality of this element (Ghanem et al. 2003 It is also interesting to mention that the local alignment nature of the TBA aligner (which constitutes the core of the Mulan tool) enables the correct recapitulation of the conservation profile for the GATA3 locus with all the species. In particular, the draft quality of the zebrafish genome represents this locus as a combination of forward- and reverse-strand sequences joined together (Fig. 5C). The synteny breakpoint appearing after the first GATA3 exon is probably just an artifact of the assembly of this locus. Otherwise it would destroy the integrity of the GATA3 ORF in zebrafish.
Multiple-sequence conservation of transcription factor binding sites
We used the Mulan/multiTF combination to analyze the distribution of TFBS in ECR3 from the GATA3 locus that is shared by all vertebrate species but chicken (Fig. 6). This analysis was aimed at providing an in silico evidence for the bone-specific function of this element to support the hypothesis that the absence of this element could possibly be related to the process of wing formation in birds. PWM matrices for 399 vertebrate TFBS families available from the TRANSFAC Professional 7.3 library (http://www.biobase.de/
Interestingly, only one putative TFBS corresponding to the CRE-BP1 regulatory protein was detected by multiTF in the scan of the ECR3 multiple-sequence alignment to be shared by all the species using almost 400 other TFBS matrices (Fig. 6). CRE-BP1, also known as ATF2, has been shown to trigger the development of primary fibrosarcomas in the chicken wing (van Dam and Castellazzi 2001
To demonstrate the cumulative effect of searching for TFBS in multiple sequence alignments and the dramatic functional enrichment resulting from each additional sequence incorporated into the comparison, we analyzed several regions encompassing known functional sites (Table 1). We selected three genomic regions ranging in size from 150 kb to 230 kb and corresponding to PAX6, NKX2.5, and NKX2.9/PAX9 genomic loci. It has been shown that PAX6 has autoregulatory activity mediated though a PAX6 TFBS located in an intron (Kleinjan et al. 2004
Mulan-GALA interconnection and finding orthologous regions The database of genomic DNA sequence alignments and annotations (GALA; http://globin.cse.psu.edu/gala/
The interconnection link of GALA to Mulan is established through forwarding a list of homologous regions from different species from GALA to Mulan. One of the critical steps in generating a multiple alignment in a locus is identifying the homologous DNA intervals in the other species. This is complicated by the existence of paralogs of many sequences, generated by transposition, segmental duplications, and chromosomal rearrangements. Thus, a given DNA interval, say in human, may match to multiple locations in the mouse genome. Furthermore, a long DNA segment in human may match to several orthologous regions in mouse that could have a different order and orientation than the human sequence (Kent et al. 2003
We have implemented a partial, but quite useful solution, by using the chains and nets (Kent et al. 2003
As an example, the ZFPM1 gene, which encodes a multiple Zn-finger protein called Friend of GATA1 (FOG1), was identified in GALA and the orthologous regions were found in mouse, rat, and chicken. These were automatically transferred to Mulan, which also picked up the annotation from the knownGenes track at the UCSC Genome Browser (Kent et al. 2002
Interconnection with the UCSC genome browser database Mulan is dynamically linked to the UCSC genome browser database (Karolchik et al. 2003
The exponential growth of available DNA sequences produced by international genome-sequencing cohorts is creating an invaluable, enormous collection of genomic sequences from different eukaryotic and prokaryotic organisms. Particularly the addition of the chicken genome, Gallus gallus, marks a multifaceted advance in biology, largely due to the importance of this organism in agriculture and as a model for nonmammalian vertebrate development, but equally importantly due to its strategic evolutionary position in the tree of life between mammals and fish. The chicken genome provides a priceless substrate for genomic comparisons, and will allow us to better understand the overall genomic structure and evolution of vertebrates. To fully capitalize on this information-rich genome, we require innovative methods and tools for conducting creative comparative multispecies sequence analysis. Here, we described the Mulan tool that introduces a novel reliable approach to generate MSLAs. The tool is capable of producing fast and accurate alignments for both distantly and closely related organisms, such as humans, primates, fish, and chicken, properly taking into account the complexity of evolutionary sequence rearrangements such as inversions, transpositions, and subsequence reshuffling.
Mulan introduces several novel options for users to manipulate both the textual alignments and the graphical conservation displays to differently address the conservation structure of either closely or distantly related species. In particular, the option of coloring conserved regions using a gradient based on the number of species in which the region is conserved, coupled with a module that filters out ECRs that are shared by fewer than a requested number of species, permits straightforward identification of elements that are shared by a subset of species. This is illustrated through comparisons of the chicken GATA3 locus to the orthologous regions from humans, rodents, frog, and fish. This type of analysis can be important for generating hypotheses about the function of ECRs shared by a limited number of species (Frazer et al. 2004 The speed with which Mulan is capable of handling Megabase-long genomic sequences (on the order of minutes) and the dynamic character of the user interface are remarkable. Interactive conservation profiles allow user-selection of an ECR that displays the multiple-sequence alignment for that element. The dynamic interconnection between Mulan and the multiTF tool presents an effective way to identify transcription factor binding sites shared by multiple species. These tools can be used to predict the function of anonymous noncoding ECRs and to approach the description of gene regulation methods and networks. In addition, the draft alignment option of the Mulan tool allows easy O&O of chicken BAC contigs using the WGS assembly as the reference sequence. In sharp contrast to several other available global multiple-sequence alignment tools, the threaded blockset alignment strategy implemented by Mulan detects and properly processes DNA rearrangements often characteristic of synteny among distantly related genomes. Also, it highlights subsequent reshufflings in order to restore all the changes responsible for the evolutionary history of multiple related sequences. Because of these features, Mulan permits the dynamic interchange of reference sequences and will accordingly generate textual (and graphical) MSLAs interactively, and very rapidly.
Generating alignments Mulan aligns draft and finished sequences using different alignment strategies. The draft approach employs a combination of BLASTZ and refine programs (Schwartz et al. 2003b
High-quality finished sequences (contiguous single-sequence FASTA files) are aligned using a modified version of the TBA program previously described (Blanchette et al. 2004
Phylogenetic tree guidance for TBA alignments The phylogenetic relationship of the input sequences is essential; however, we do not require the user to manually input this information, as this could be a nontrivial task. Instead, Mulan predicts a phylogenetic tree describing the evolutionary history of the input species and just asks the user to verify the correctness of it prior to the final step of the TBA alignment. Phylogenetic tree prediction is generated using an intermediate limited multiple-sequence local alignment, which is produced by the refine program. The user has the option to change the structure of the automatically generated phylogenetic tree by altering its textual representation. No corrections were necessary while testing Mulan on several input examples with significantly diverged sequences.
Neighbor-joining method for phylogenetic tree construction
where the branching distances are in the number of single nucleotide mutations per kb. Mulan also generates a graphical representation of the phylogenetic tree (see Fig. 5A, for example). At the intermediate step of the optional manual curation of the phylogenetic relationships among the input species, the user is not required to indicate branching distances, but just to regroup the nodes by altering the textual representation of the phylogenetic tree.
Phylogenetic shadowing and summary conservation profiles
Practical implementation of phylogenetic shadowing in Mulan is based on the differentiation of shaded and fully conserved nucleotides (that are exactly the same in all sequences in the alignment) and treating them as a set of simple matches and mismatches projected to the reference sequence (Ovcharenko et al. 2004a
The "summary conservation" option of the Mulan tool is very similar in implementation to the phylogenetic shadowing option, but differs in the underlying assumption and the produced graphical visualization profile. Instead of identifying fully conserved nucleotides, Mulan identifies nucleotides from the reference sequence that have matches with at least one other species. Basically, a nucleotide is called conserved in this method if it is conserved in any of the pairwise comparisons. (One can refer to the phylogenetic shadowing and "summary conservation" methods as AND and OR logical operators applied to a multiple-sequence alignment). Application of the "summary conservation" option will be beneficial in the cases of divergent degeneration and complementation of duplicated genes when different gene duplicates can display different data sets of gene regulatory elements (Prince and Pickett 2002
Multiple-sequence conservation of transcription factor binding sites
In the first step, putative TFBSs are identified in all the original sequences by using TRANSFAC PWM matrices to define consensus sequences and the tfSearch utility to map consensus sequences of TFBSs to the genomic sequences of different species (Wingender et al. 1996 The second step excludes all of the TFBS predictions overlapping with coding exons. Obviously, gene annotation of only one of the sequences (e.g., the reference sequence) is sufficient at this step. The final step detects TFBS predictions that are shared by all of the species and are located at the same position as defined by the alignment. In order to do so we scan through all of the anchor or fully conserved nucleotides (nucleotides that are identical in all of the species in the multiple-sequence alignment; Fig. 8). If a TFBS from the reference sequence is found to overlap with an anchor nucleotide, we project this TFBS position to all of the other species by using the alignment and excluding gaps (Fig. 8). Starting and ending positions of the footprint of the reference sequence TFBS are compared to the starting and ending positions for the same TFBS on the same strand as detected by the initial TFBS annotation. If corresponding TFBS can be identified in all of the species in the alignment, this is reported by the multiTF.
List of options provided from the Mulan results Web page In summary, upon generating an alignment, Mulan provides the user with the following list of options:
We thank Colleen Elso for her critical suggestions on the manuscript. W.M and R.H. were supported by NHGRI grant HG02238; G.G.L was supported by an LLNL LDRD-04-ERD-052 grant; I.O. was in part supported by a DOE SCW0345 grant. The work was performed under the auspices of the U.S. Department of Energy by the Univ. of California, Lawrence Livermore National Laboratory Contract #W-7405-Eng-48.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3007205. Article published online before print in December 2004.
6 Corresponding author.
Aerts, S., Thijs, G., Coessens, B., Staes, M., Moreau, Y., and De Moor, B. 2003. Toucan: Deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31: 1753-1764.
Andl, T., Ahn, K., Kairo, A., Chu, E.Y., Wine-Lee, L., Reddy, S.T., Croft, N.J., Cebra-Thomas, J.A., Metzger, D., Chambon, P., et al. 2004. Epithelial Bmpr1a regulates differentiation and proliferation in postnatal hair follicles and is essential for tooth development. Development 131: 2257-2268.
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14: 708-715.
Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394.
Bray, N., Dubchak, I., and Pachter, L. 2003. AVID: A global alignment program. Genome Res. 13: 97-102.
Brown III, C.O., Chi, X., Garcia-Gras, E., Shirai, M., Feng, X.H., and Schwartz, R.J. 2004. The cardiac determination factor, Nkx2-5, is activated by mutual cofactors GATA-4 and Smad1/4 via a novel upstream enhancer. J. Biol. Chem. 279: 10659-10669.
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., and Batzoglou, S. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731.
Elnitski, L., Li, J., Noguchi, C.T., Miller, W., and Hardison, R. 2001. A negative cis-element regulates the level of enhancement by hypersensitive site 2 of the
Frazer, K.A., Tao, H., Osoegawa, K., de Jong, P.J., Chen, X., Doherty, M.F., and Cox, D.R. 2004. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14: 367-372.
Ghanem, N., Jarinova, O., Amores, A., Long, Q., Hatch, G., Park, B.K., Rubenstein, J.L., and Ekker, M. 2003. Regulatory roles of conserved intergenic domains in vertebrate Dlx bigene clusters. Genome Res. 13: 533-543.
Giardine, B., Elnitski, L., Riemer, C., Makalowska, I., Schwartz, S., Miller, W., and Hardison, R.C. 2003. GALA, a database for genomic sequence alignments and annotations. Genome Res. 13: 732-741. Gilligan, P., Brenner, S., and Venkatesh, B. 2002. Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences. Gene 294: 35-44.[CrossRef][Medline] Hardison, R.C., Chiaromonte, F., Kolbe, D., Wang, H., Petrykowska, H., Elnitski, L., Yang, S., Giardine, B., Zhang, Y., Riemer, C., et al. 2003. Global predictions and tests of erythroid regulatory regions. In Genome of homo sapiens, pp. 335-344. Cold Spring Harbor Press, Cold Spring Harbor, NY.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., et al. 2003. The UCSC Genome Browser Database. Nucleic Acids Res. 31: 51-54.
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 996-1006.
Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. 2003. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. 100: 11484-11489. Kleinjan, D.A., Seawright, A., Childs, A.J., and van Heyningen, V. 2004. Conserved elements in Pax6 intron 7 involved in (auto)regulation and alternative transcription. Dev. Biol. 265: 462-477.[CrossRef][Medline]
Kolbe, D., Taylor, J., Elnitski, L., Eswara, P., Li, J., Miller, W., Hardison, R., and Chiaromonte, F. 2004. Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat. Genome Res. 14: 700-707. Lawoko-Kerali, G., Rivolta, M.N., and Holley, M. 2002. Expression of the transcription factors GATA3 and Pax2 during development of the mammalian inner ear. J. Comp Neurol. 442: 378-391.[CrossRef][Medline] Lenhard, B., Sandelin, A., Mendoza, L., Engstrom, P., Jareborg, N., and Wasserman, W.W. 2003. Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2: 13.[CrossRef][Medline]
Lettice, L.A., Heaney, S.J., Purdie, L.A., Li, L., de Beer, P., Oostra, B.A., Goode, D., Elgar, G., Hill, R.E., and de Graaff, E. 2003. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12: 1725-1735. Lim, K.C., Lakshmanan, G., Crawford, S.E., Gu, Y., Grosveld, F., and Engel, J.D. 2000. Gata3 loss leads to embryonic lethality due to noradrenaline deficiency of the sympathetic nervous system. Nat. Genet. 25: 209-212.[CrossRef][Medline]
Loots, G.G. and Ovcharenko, I. 2004. rVISTA 2.0: Evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32: W217-W221.
Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., and Frazer, K.A. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288: 136-140.
Loots, G.G., Ovcharenko, I., Pachter, L., Dubchak, I., and Rubin, E.M. 2002. rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12: 832-839.
Mayor, C., Brudno, M., Schwartz, J.R., Poliakov, A., Rubin, E.M., Frazer, K.A., Pachter, L.S., and Dubchak, I. 2000. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16: 1046-1047.
Nobrega, M.A., Ovcharenko, I., Afzal, V., and Rubin, E.M. 2003. Scanning human gene deserts for long-range enhancers. Science 302: 413.
Ovcharenko, I., Boffelli, D., and Loots, G.G. 2004a. eShadow: A tool for comparing closely related sequences. Genome Res. 14: 1191-1198.
Ovcharenko, I., Loots, G.G., Hardison, R.C., Miller, W., and Stubbs, L. 2004b. zPicture: Dynamic alignment and visualization tool for analyzing conservation profiles. Genome Res. 14: 472-477.
Pennacchio, L.A., Olivier, M., Hubacek, J.A., Cohen, J.C., Cox, D.R., Fruchart, J.C., Krauss, R.M., and Rubin, E.M. 2001. An apolipoprotein influencing triglycerides in humans and mice revealed by comparative sequencing. Science 294: 169-173. Prince, V.E. and Pickett, F.B. 2002. Splitting pairs: The diverging fates of duplicated genes. Nat. Rev. Genet. 3: 827-837.[CrossRef][Medline] Saitou, N. and Nei, M. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 406-425.[Abstract]
Santagati, F., Abe, K., Schmidt, V., Schmitt-John, T., Suzuki, M., Yamamura, K., and Imai, K. 2003. Identification of cis-regulatory elements in the mouse Pax9/Nkx29 genomic region: Implication for evolutionary conserved synteny. Genetics 165: 235-242.
Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMakerA web server for aligning two genomic DNA sequences. Genome Res. 10: 577-586.
Schwartz, S., Elnitski, L., Li, M., Weirauch, M., Riemer, C., Smit, A., Green, E.D., Hardison, R.C., and Miller, W. 2003a. MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res. 31: 3518-3524. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. 2003b. Humanmouse alignments with BLASTZ. Genome Res. 13: 103-107.
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673-4680. van Dam, H. and Castellazzi, M. 2001. Distinct roles of Jun: Fos and Jun: ATF dimers in oncogenesis. Oncogene 20: 2453-2464.[CrossRef][Medline]
Van Esch, H. and Bilous, R.W. 2001. GATA3 and kidney development: Why case reports are still important. Nephrol. Dial. Transplant 16: 2130-2132. Welch, J.J., Watts, J.A., Vakoc, C.R., Yao, Y., Wang, H., Hardison, R.C., Blobel, G.A., Chodosh, L.A., and Weiss, M.J. 2004. Global regulation of erythroid gene expression by transcription factor GATA-1. Blood (in press).
Wingender, E., Dietze, P., Karas, H., and Knuppel, R. 1996. TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24: 238-241.
http://www.bx.psu.edu/miller_lab/; Source code for the aligners and the aligner-evaluation software. http://globin.cse.psu.edu/gala/; GALA. http://mulan.dcode.org; Mulan. http://rvista.dcode.org/; rVista 2.0. http://www.jgi.doe.gov/; Joint Genome Institute sequencing facility. http://www.biobase.de/; TRANSFAC.
Received July 14, 2004; accepted in revised format August 31, 2004. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||