|
|
|
|
Genome Res. 14:789-801, 2004 ©2004 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/04 $5.00 Analysis of Segmental Duplications and Genome Assembly in the Mouse1 Department of Genetics, Center for Computational Genomics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, Ohio 4410, USA 2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA 3 Dipartimento di Anatomia Patologica e di Genetica, Sezione di Genetica, University of Bari, Bari 70126, Italy
Limited comparative studies suggest that the human genome is particularly enriched for recent segmental duplications. The extent of segmental duplications in other mammalian genomes is unknown and confounded by methodological differences in genome assembly. Here, we present a detailed analysis of recent duplication content within the mouse genome using a whole-genome assembly comparison method and a novel assembly independent method, designed to take advantage of the reduced allelic variation of the C57BL/6J strain. We conservatively estimate that 57% of all highly identical segmental duplications ( 90%) were misassembled or collapsed within the working draft WGS assembly. The WGS approach often leaves duplications fragmented and unassigned to a chromosome when compared with the clone-ordered-based approach. Our preliminary analysis suggests that 1.7%2.0% of the mouse genome is part of recent large segmental duplications (about half of what is observed for the human genome). We have constructed a mouse segmental duplication database to aid in the characterization of these regions and their integration into the final mouse genome assembly. This work suggests significant biological differences in the architecture of recent segmental duplications between human and mouse. In addition, our unique method provides the means for improving whole-genome shotgun sequence assembly of mouse and future mammalian genomes.
The era of whole-genome sequencing has created an opportunity to assess fundamental biological processes of genome evolution in a global fashion (Eichler and Sankoff 2003
Understanding the nature and pattern of recent segmental duplications is important for both practical and biological reasons. First, duplication of genomic sequence followed by subsequent mutation is one of the primary forces of functional and structural evolution (Muller 1936
The detection of recent segmental duplications is sensitive to the quality of the underlying sequence assembly. Large blocks of highly homologous duplications pose challenges to both clone-ordered and whole-genome shotgun sequence-based assembly methods (Green 1997
To overcome some of the limitations associated with the detection of highly homologous duplication, we developed two independent in silico detection strategies. The first method termed whole-genome assembly comparison (WGAC) is a BLAST-based approach that performs an all-by-all comparison of assembled genomic sequence (Bailey et al. 2001 We provide an assessment of the duplication content of the mouse genome on the basis of these two fundamentally different methods. To increase our power to detect duplicated regions, we implemented a sequence quality filter that allowed us to take advantage of the reduced allelic variation. For the purpose of this study, we chose to analyze the published working-draft sequence of the C57BL/6J mouse genome (MGSCv3, 2.5 Gb), as well a smaller BAC-based assembly of only finished sequence (NCBI build 29, 440 Mb). It should be pointed out that the latter represents a small proportion of the mouse genome, in which clone choice selection may be biased. These two assemblies provide a direct comparison of the strengths and weaknesses of BAC-based clone-ordered versus whole-genome shotgun approaches for estimating global segmental duplication content. On the basis of the results of our analysis, we have constructed an integrated mouse segmental duplication database that will provide a frame-work for future evolutionary analyses. In addition, the resource should provide valuable information in directing finishing and sequencing efforts within the mouse.
Human Versus Mouse Genome Assembly Comparisons for Segmental Duplication We compared the duplication content ( 90% sequence identity) of mouse (MGSCv3) and human draft genomes on the basis of the published sequence assemblies (IHGSC 2001 90% and 10 kb in length; Table 1; Supplemental Table 1; Supplemental Figure 1). The median length of the alignments was significantly shorter for the mouse (13.7 kb) in contrast to human (26.5 kb). The most striking feature was that the number of pairwise alignments differed by more than an order of magnitude (6552 human pairwise alignments as compared with 732 mouse pairwise alignments; Table 2). Further, the majority of the mouse alignments (425) involved the unassigned mouse chromosome, suggesting that the treatment of mouse duplications were problematic during the assembly.
Whole-Genome Shotgun Versus Clone-Ordered Assembly of the Mouse Genome Because two different methodologies, whole-genome shotgun versus clone-ordered-based sequencing, were used to assemble the human and mouse genomes, respectively, the apparent dearth of highly identical duplications within the mouse assembly may have resulted from collapse of whole-genome shotgun sequence reads during the assembly process (MGSCv3). To test this hypothesis, we compared the duplication content of the NCBI clone-ordered assembly (build 29) of mouse C57BL/6J BACS with that of the published mouse genome assembly (Table 1; Supplemental Table 2). An examination of 439 Mb of build 29, approximately one-fifth of the genome, predicted a significant increase in the length (mean 23.4 kb), frequency (1.74% of the genome), and the number of pairwise alignments (241 alignments) (Tables 1 and 2). If build 29 is representative of the entire mouse genome, these data predict that 60% (1 - 0.0070/0.0174) of segmental duplications may have been collapsed inadvertently during the assembly. Both WGAC analyses suggest that the intrachromosomal duplications predominate in mouse in contrast to the human, in which interchromosomal and intrachromosomal pairwise alignments are equally prevalent (Figs. 1 and 2). If we limit our analysis to more divergent duplications (<94% identity, which can be easily resolved by WGS assembly methods), there is virtually a complete absence of interchromosomal duplications with MGSCv3 (Fig. 2).
Whole-Genome Shotgun Sequence Detection of Mouse Duplications As an independent approach to detect highly homologous mouse segmental duplications, we applied a previously described whole-genome shotgun sequence detection (WSSD) method (Table 3; Bailey et al. 2002a 95% sequence identity (Bailey et al. 2002a
To test the utility of this method, we established a baseline for comparison by calibrating a collection of unique (2052 kb) and duplicated (952 kb) mouse BACs (Supplemental Table 3). For each reference sequence, both the depth of coverage and average percent sequence identity were measured for sequence reads within 5-kb windows. (Each window corresponded to 5 kb of genomic sequence in which known repetitive sequences were excluded). Because the C57BL/6J mouse represents a highly inbred strain with limited allelic variation, we examined more closely the degree of sequence variation. To improve our power, we considered only high-quality (phred quality score 30) bases during our calculation of sequence identity (Ewing et al. 1998
Figure 3 depicts a typical comparison of two mouse BACs containing known unique and duplicated sequence before and after quality masking. Due to the homogenous nature of the mouse genome, the ratio of the number of diverged sequence reads to identical sequence reads (termed the divergent read ratio) was used to provide a crude estimate for the copy number for a given reference segment. Among unique regions of the genome, this ratio should approximate zero. In contrast, a region of the mouse genome duplicated once would possess a divergent read ratio of oneas half of the reads map to a separate locus. This, of course, assumes that there will be at least 2-bp differences per read ( 700 bp on average) between duplicated loci for all paralogous reads. Identical sequence duplications could not be detected simply by the divergent read ratio. Significant differences in both the depth-of-coverage and the divergent read ratio were observed between unique and duplicated reference sequences (Fig. 3). Although both measures (depth-of-coverage and divergent read ratio) could effectively discriminate highly homologous (>95%) duplications, the divergent read ratio showed the greatest sensitivity (Supplemental Table 3) in our analysis.
We applied the whole-genome shotgun sequence detection (WSSD) strategy separately to MGSCv3 (2475 Mb) and to all available finished C57BL/6J BACs (706 Mb; 4298 BACs). This entailed a computational intensive analysis of 40.7 million reads against both reference genomes assessing the depth-of-coverage and divergent read ratio in 5-kb windows (overlapping 1 kb; see Methods). We identified all regions in which at least five consecutive windows were consistent with duplication (a divergent read ratio
We experimentally validated our detection strategy by FISH. Previous analyses suggest good correlation between highs-equence identity duplications and the presence of multisite FISH signals (Bailey et al. 2001
A Comparison of Duplication Detection and Genome Assembly Methods Table 3 compares different duplication detection methods and genome assembly strategies. In general, the whole-genome analysis comparison estimate of duplication content from build 29 is more consistent with the duplication estimate on the basis of whole-genome shotgun sequence detection (Table 1). Approximately 49% of the bases that were detected by WGAC ( 90% sequence identity and 10 kb) of build 29 were also positive by WSSD method. Because build 29 does not represent a complete genome sequence, regions that score positive by WSSD, but not WGAC, are expected (Supplemental Fig. 2). Regions that score positive by WGAC, but not WSSD, likely represent missing sequence overlaps in the assembly. In contrast, for MGSCv3, only 12% (159/954) of the potential duplicated regions were concordant between WSSD and WGAC (16% by duplication of the bases). These data confirm potential sequence collapse of segmental duplications during assembly of the mouse genome. If only WSSD regions are considered, then the duplication estimate for the draft genome begins to approximate the clone-ordered assembly. Finally, it should be noted that the average length of duplicated sequences within MGSCv3 is substantially shorter than that for build 29. Only eight alignments in the MGSCv3 were >30 kb in length (maximum 114 kb). The four largest alignments were highly similar (>95%) tandem duplications completely contained within the small number of finished BAC sequences that were incorporated into the assembly. The four other alignments outside of finished sequence were 3040 kb in length and <95% identical.
Gene Content Analysis
The published mouse genome sequence (MGSCv3) represented one of the first attempts to publicly sequence and assemble a mammalian genome based largely on whole-genome shotgun sequence read data. A particular concern of such an approach has been the treatment of large high-copy repeats and segmental duplications that share a high degree of sequence identity (Green 1997 6%7%). The resolution of such regions is, therefore, important to the genetics community and remains one of the most difficult tasks in the completion of the human genome. It is currently unknown whether the duplication-rich and gene-rich content of the human genome is characteristic of mammalian genome organization. An assessment of the duplication content and its relationship to the proteome are therefore critical issues in not only directing finishing efforts, but also in understanding the biology of the organism. Whereas it is typically expected that WGS sequence assemblies will underestimate the true duplication content (Bailey et al. 2001
In this study, we examined the duplication content using two different approaches, a sequence assembly-based approach (termed WGAC) and a whole-genome shotgun sequence detection measure (termed WSSD). The latter, which is not dependent upon the assembly, was used previously as a robust method to detect large, highly identical duplications within the human (Bailey et al. 2002a
There are a few important conclusions from this study with respect to genome assembly. A strict whole-shotgun sequence approach such as Arachne (Batzoglou et al. 2002
As expected, clone-ordered-based approaches for sequence assembly appear to more effectively resolve duplication overlaps, although artifactual duplications are more frequently encountered. Build 29 shows the best correlation between duplications confirmed by WGAC and WSSD in our analysis (Table 3). In contrast, 74% of the duplications within the whole-genome shotgun sequence detection could only be detected by WSSD, and most of these mapped to the unplaced chromosome. If the experimental cytogenetic data is used to estimate false positives (22%), we conclude that 57% of the large duplications ( 95% and 10 kb) have not yet been resolved within the assembly. Our data suggest that a combined approach using whole-genome shotgun sequence detection to identify regions of duplication within a WGS assembly followed by targeted high-quality BAC clone sequencing could provide the most affordable and effective means for resolving these complex regions of the genome. In this study, we pinpoint a small fraction ( 1%) of the mouse genome that should be targeted for finished sequence within BAC clones. These regions are unlikely to be properly assembled and mapped, irrespective of increased depths of whole-genome shotgun sequencing. We have constructed an integrated mouse segmental duplication database (http://mouseparalogy.gene.cwru.edu
Biologically, some interesting differences in the pattern and organization of segmental duplications can be deduced when compared with human. Our analysis shows that only 0.54% of the annotated RefSeqs fall into duplicated regions, even though 1.5%2.0% of the genome is predicted to be duplicated by the WSSD method. This is in contrast to the human, where 6.1% of the RefSeqs fell into duplicated regions, with 5.2% of the genome predicted to be duplicated (Bailey et al. 2002a
The identification of coding sequence within these duplicated regions of the mouse genome has some interesting practical and biological implications. The mouse has been an invaluable tool for dissecting gene function, due to the ability to directly manipulate the genome and assess the phenotypic consequences in vivo (van der Weyden et al. 2002
Whole-Genome Assembly Comparison of Mouse Draft Genome (MGSCv3) To analyze mouse segmental duplications, we applied a BLAST-based whole-genome assembly comparison (Bailey et al. 2001 90% identity) primate-specific segmental duplications ( 1 kb). We applied this method to the mouse, but detected an excess of smaller putative segmental duplications (870,969 seeding alignments 500 bp and 88% identity, 10-fold greater than standard human analysis). Upon inspection, the vast majority of these alignments corresponded to incompletely masked high-copy repeats (mainly LTR and LINE elements). In mouse, both LTR and L1 elements show increased activity as well as complicated evolutionary histories (DeBerardinis et al. 1998 2500 bp; Supplement 1). At this threshold, many uncharacterized transposable element alignments were still present. To avoid these larger transposable elements, we set a 10-kb threshold for most analyses avoiding the inclusion of all, but possibly, the largest full-length endogenous retroviral elements.
Whole-Genome Alignment Comparison: NCBI Build 29 Finished Clone-Based Sequence
It has been shown previously that clone-ordered genome assemblies are more apt to overestimate segmental duplication content (as much as threefold) due to a failure to correctly merge sequence overlaps (Bailey et al. 2001
Whole-Genome Shotgun Sequence Detection of Duplications
Each reference mouse genome sequence was compared by Megablast against the entire set of mouse WGS (whole-genome shotgun sequence reads [40,782,208 sequences; 31,117,512,375 bp]). Reference sequences were initially lowercase, masked for repeat elements showing <5% divergence from the consensus sequence, with the exception of LTR and LINE elements, which were masked at 15% and 10% divergence from consensus, respectively. This increased our sensitivity in removing lineage-specific elements. Megablast alignments were performed using lowercase masking parameters (-D 3 -J F -P 93 -U T -F m -s 220), which allows for greedy-algorithm extension into adjacent repetitive regions. The quality of the query sequence (genomic piece) was assumed to be high quality. Aligned bases from the read with a PHRED score of <30 (error rate >1/1000) were ignored in determining the percent identity. This process corrects for sequencing errors in an unbiased way (regardless of match or mismatch). The program paralogy_detector was then run on every segment. Alignments were only considered if they were >400 bp, represented 90% of the read, and had at least 300 bp within the unique regions, with a rescored similarity of >94% and
A read-based detection method has been previously based on number of reads in 5-kb windows (1-kb overlap slide). In general, mouse unique sequence read depth showed slightly increased variability (40.3 +/- 13.5 reads per 5-kb reference) when compared with a similar analysis performed with human data (50.4 +/- 12.8 reads per 5-kb reference; Bailey et al. 2002a
FISH Analysis A subset of mouse BAC clones with large (>20 kb) regions of duplication as determined by WSSD detection were subsequently examined by FISH. Metaphase nuclei were examined to identify interchromosomal or intrachromosomal duplications that were interspersed by 5 Mb or more. More intense FISH signals, which localized to a single site, were subsequently examined by interphase nuclei. Interphase analyses were controlled for replication by comparing cells at both G1 and G2 stages of arrest. At least 10 interphase nuclei were examined for each preparation. The number of interphase nuclei signals and signal intensity was compared with unique hybridizing clones to provide a relative estimate of copy number. Because probe signal intensity may vary due to sequence property differences, copy-number estimates provided in Supplemental Table 3 should be considered approximate.
Gene Content Analysis
As part of the annotation pipeline, the proteins from translated RefSeq mRNAs are compared by BLAST (Altschul et al. 1990
We thank the large-scale sequencing centers (Baylor College of Medicine, Cold Spring Harbor Laboratory, Genome Therapeutics Corporation, Harvard Partners Genome Center, Joint Genome Institute, The NIH Intramural Sequencing Center, The UK-MRC Sequencing Consortium, The University of Oklahoma Advanced Center for Genome Technology, The University of Texas Southwest, The Whitehead Institute for Biomedical Research, The Washington University Genome Sequencing Center, and the Wellcome Trust Sanger Institute) for access to all large-scale finished sequence, genome assembly, and trace sequence data from the mouse genome prior to publication. We thank Ilya Dondoshansky for modifying megaBLAST output into a form that significantly increased the speed of analysis. We thank Royden Clark and Ulrich Neuss for technical assistance. This work was supported, in part, by NIH grants GM58815 and HG002385 to E.E.E., a NIH Career Development Program in Genomic Epidemiology of Cancer (CA094816 [GenBank] ) to J.A.B., Telethon, CEGBA (Centro di Eccellenza Geni in campo Biosanitario e Agroalimentare), MIUR (Ministero Italiano della Universita' e della Ricerca; Cluster C03, Prog. L.488/92) to M.R., the W.M. Keck Foundation, and the Charles B. Wang Foundation. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2238404.
4 Corresponding author. [Supplemental material is available online at www.genome.org.]
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410.[CrossRef][Medline]
Armengol, L., Pujana, M.A., Cheung, J., Scherer, S.W., and Estivill, X. 2003. Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum. Mol. Genet. 12: 2201-2208.
Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J., and Eichler, E.E. 2001. Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 11: 1005-1017.
Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002a. Recent segmental duplications in the human genome. Science 297: 1003-1007. Bailey, J.A., Yavor, A.M., Viggiano, L., Misceo, D., Horvath, J.E., Archidiacono, N., Schwartz, S., Rocchi, M., and Eichler, E.E. 2002b. Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22. Am. J. Hum. Genet. 70: 83-100.[CrossRef][Medline] Bailey, J.A., Liu, G., and Eichler, E.E. 2003. An Alu transposition model for the origin and expansion of human segmental duplications. Am. J. Hum. Genet. 73: 823-834.[CrossRef][Medline]
Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., and Lander, E.S. 2002. ARACHNE: A whole-genome shotgun assembler. Genome Res. 12: 177-189. Cheung, V.E., Nowak, N., Jang, W., Kirsch, I.R., Zhao, S., Chen, X.-N., Furey, T.S., Kim, U.-J., Kuo, W.-L., Olivier, M., et al. 2001. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409: 953-958.[CrossRef][Medline] Cheung, J., Estivill, X., Khaja, R., MacDonald, J.R., Lau, K., Tsui, L.C., and Scherer, S.W. 2003a. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4: R25.[CrossRef][Medline] Cheung, J., Wilson, M.D., Zhang, J., Khaja, R., MacDonald, J.R., Heng, H.H., Koop, B.F., and Scherer, S.W. 2003b. Recent segmental and gene duplications in the mouse genome. Genome Biol. 4: R47.[CrossRef][Medline] Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847.[CrossRef][Medline] Copley, R.R., Goodstadt, L., and Ponting, C. 2003. Eukaryotic domain evolution inferred from genome comparisons. Curr. Opin. Genet. Dev. 13: 623-628.[CrossRef][Medline] DeBerardinis, R.J., Goodier, J.L., Ostertag, E.M., and Kazazian Jr., H.H. 1998. Rapid amplification of a retrotransposon subfamily is evolving the mouse genome. Nat. Genet. 20: 288-290.[CrossRef][Medline]
Dehal, P., Predki, P., Olsen, A.S., Kobayashi, A., Folta, P., Lucas, S., Land, M., Terry, A., Zhou, C.L.E., Rash, S., et al. 2001. Human chromosome 19 and related regions in mouse: Conservative and lineage specific evolution. Science 293: 104-111.
DiDonato, C.J., Chen, X.N., Noya, D., Korenberg, J.R., Nadeau, J.H., and Simard, L.R. 1997. Cloning, characterization, and copy number of the murine survival motor neuron gene: Homolog of the spinal muscular atrophy-determining gene. Genome Res. 7: 339-352.
Eichler, E.E. 1998. Masquerading repeats: Paralogous pitfalls of the Human Genome. Genome Res. 8: 758-762.
. 1999. Repetitive conundrums of centromere structure and function. Hum. Mol. Genet. 8: 151-155.
. 2001. Segmental duplications: What's missing, misassigned, and misassembledAnd should we care? Genome Res. 11: 653-656.
Eichler, E.E. and Sankoff, D. 2003. Structural dynamics of eukaryotic chromosome evolution. Science 301: 793-797.
Estivill, X., Cheung, J., Pujana, M.A., Nakabayashi, K., Scherer, S.W., and Tsui, L.C. 2002. Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. Hum. Mol. Genet. 11: 1987-1995.
Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: 175-185. Giglio, S., Calvari, V., Gregato, G., Gimelli, G., Camanini, S., Giorda, R., Ragusa, A., Guerneri, S., Selicorni, A., Stumm, M., et al. 2002. Heterozygous submicroscopic inversions involving olfactory receptor-gene clusters mediate the recurrent t(4;8)(p16;p23) translocation. Am. J. Hum. Genet. 71: 276-285.[CrossRef][Medline]
Gimelli, G., Pujana, M.A., Patricelli, M.G., Russo, S., Giardino, D., Larizza, L., Cheung, J., Armengol, L., Schinzel, A., Estivill, X., et al. 2003. Genomic inversions of human chromosome 15q11-q13 in mothers of Angelman syndrome patients with class II (BP2/3) deletions. Hum. Mol. Genet. 12: 849-858.
Green, P. 1997. Against a whole-genome shotgun. Genome Res. 7: 410-417. International Human Genome Sequencing Consortium (IHGSC). 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.[CrossRef][Medline]
Ji, Y., Walkowicz, M.J., Buiting, K., Johnson, D.K., Tarvin, R.E., Rinchik, E.M., Horsthemke, B., Stubbs, L., and Nicholls, R.D. 1999. The ancestral gene for transcribed, low-copy repeats in the Prader-Willi/Angelman region encodes a large protein implicated in protein trafficking, which is deficient in mice with neuromuscular and spermiogenic abnormalities. Hum. Mol. Genet. 8: 533-542.
Ji, Y., Eichler, E.E., Schwartz, S., and Nicholls, R.D. 2000. Structure of chromosomal duplicons and their role in mediating human genomic disorders. Genome Res. 10: 597-610. Johnson, M.E., Viggiano, L., Bailey, J.A., Abdul-Rauf, M., Goodwin, G., Rocchi, M., and Eichler, E.E. 2001. Positive selection of a gene family during the emergence of humans and African apes. Nature 413: 514-519.[CrossRef][Medline]
Kazazian Jr., H.H. 2000. Genetics. L1 retrotransposons shape the mammalian genome. Science 289: 1152-1153.
Lichter, P., Tang, C.J., Call, K., Hermanson, G., Evans, G.A., Housman, D., and Ward, D.C. 1990. High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. Science 247: 64-69. Locke, D.P., Archidiacono, N., Misceo, D., Cardone, M.F., Dechamps, S., Roe, B.A., Rocchi, M., and Eichler, E.E. 2003. Refinement of a chimpanzee pericentric inversion breakpoint to a segmental duplication cluster. Genome Biol. 4: R50.[CrossRef][Medline] Lupski, J.R. 1998. Genomic disorders: Structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14: 417-422.[CrossRef][Medline] Mears, M.L. and Hutchison III, C.A. 2001. The evolution of modern lineages of mouse L1 elements. J. Mol. Evol. 52: 51-62.[Medline] Mouse Genome Sequencing Consortium (MGSC). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562.[CrossRef][Medline]
Muller, H.J. 1936. Bar duplication. Science 83: 528-530.
Mural, R.J., Adams, M.D., Myers, E.W., Smith, H.O., Miklos, G.L., Wides, R., Halpern, A., Li, P.W., Sutton, G.G., Nadeau, J., et al. 2002. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296: 1661-1671. Ohno, S., Wolf, U., and Atkin, N. 1968. Evolution from fish to mammals by gene duplication. Hereditas 59: 169-187.[Medline] Osborne, L.R., Li, M., Pober, B., Chitayat, D., Bodurtha, J., Mandel, A., Costa, T., Grebe, T., Cox, S., Tsui, L.C., et al. 2001. A 1.5 million-base pair inversion polymorphism in families with Williams-Beuren syndrome. Nat. Genet. 29: 321-325.[CrossRef][Medline]
Paulding, C.A., Ruvolo, M., and Haber, D.A. 2003. The Tre2 (USP6) oncogene is a hominoid-specific gene. Proc. Natl. Acad. Sci. 100: 2507-2511. Pentao, L., Wise, C., Chinault, A., Patel, P., and Lupski, J. 1992. Charcot-Marie-Tooth type 1A duplication appears to arise from recombination at repeat sequences flanking the 1.5 Mb mono |