|
|
|
|
Genome Research
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
INTRODUCTION |
|---|
|
|
|---|
Large-scale sequencing of the human genome is
now under way (Boguski et al. 1996
; Marshall and Pennisi 1996
).
Although at the beginning of the Genome Project, many doubted the
scientific value of sequencing the entire human genome, these doubts
have evaporated almost entirely (Gibbs 1995
; Olson 1995
). Primary
reasons for generating the human genomic sequence are listed in Table 1.
|
The approach being taken for human genomic sequencing is the same as that used for the Saccharomyces cerevisiae and Caenorhabditis elegans genomes, namely construction of overlapping arrays of large insert Escherichia coli clones, followed by complete sequencing of these clones one at a time. In this article, we outline an alternative approach to sequencing the human and other large genomes, which we argue is less costly and more informative than the clone-by-clone approach.
A Plan for Human Whole-Genome Shotgun Sequencing
Although there are many conceivable variations, the crux of our
plan involves high-quality, semiautomated sequencing from both ends of
very large numbers of randomly selected human genomic DNA fragments.
DNA of high molecular weight purified from at least a few different
human donors would be sheared, size-selected, and cloned into E. coli. Insert sizes would fall into two classes. Long inserts would
be 5-20 kb in size and would be cloned into plasmid, phage, or
possibly cosmid vectors. Short inserts would be 0.4-1.2 kb in size and
would be cloned into plasmid vectors. Read lengths would be of
sufficient magnitude so that the two sequence reads from the ends of
the short inserts overlap. The ratio of long to short inserts would be
1. Standard, gel-based methods would be utilized to generate at
least 30 billion nucleotides of raw sequence (10-fold coverage of the
genome). Many laboratories throughout the world could participate in
raw sequence generation, but all sequences would be deposited in a
common, public database, and only a few or possibly even one large
informatics group would assay the primary task of sequence assembly.
Following initial assembly, gaps in sequence coverage would need to be
filled and uncertainties in assembly would need to be resolved.
Sequencing from both ends of relatively long insert subclones is an
essential feature of the plan. Initially, Edwards and colleagues (1990)
and, more recently, several other groups (Chen et al. 1993
; Smith et
al. 1994
; Kupfer et al. 1995
; Roach et al. 1995
; Nurminsky and Hartl
1996
) recognized that sequence information from both ends of relatively
long inserts dramatically improves the efficiency of sequence assembly.
In contrast to single sequence reads from one end of shotgun subclones,
the pairs of sequence reads from both ends have known spacing and
orientation. Use of relatively long insert subclones also aids in the
assembly of sequences containing interspersed repetitive elements.
Roach and colleagues (1995) showed that use of a mixture of long and
short inserts can be as effective in enhancing assembly as use of only long inserts. Precise knowledge of the length of the long insert clones
is not required to realize the advantages of end sequencing.
Another essential feature of the plan is the attachment of quality
values to the raw sequences. The quality values would indicate the
likelihood that each base call is correct. Quality values would aid
sequence assembly (Churchill and Waterman 1992
; Giddings et al. 1993
;
Lawrence and Solovyev 1994
; Lipshutz et al. 1994
), would help to
distinguish true DNA polymorphisms from sequencing errors, and would
also label uncertain sequences. Quality values would not obviate the
need for relatively low error rates in the sequencing (Fleischmann
et al. 1995
). Low error rates would minimize the number of overlapping
nucleotides required for sequence joining and also the ultimate
sequence redundancy that is required. Frequent and appropriate quality
controls would need to be utilized to ensure that the raw sequence
generated is high quality. The quality of the combined sequences from
the ends of the short inserts would be enhanced because the overlapping
segment occurs at the ends of the sequence reads where base calling is
typically least reliable.
Feasibility of Whole-Genome Shotgun Sequencing
The feasibility of human whole-genome shotgun sequencing was
evaluated by computer simulation designed to determine whether sufficient coverage and linkage information would result from such an
approach. The simulation considered sequencing from both ends of two
classes of inserts, long and short. The simulation also modeled both
short and long interspersed repetitive elements (SINEs and LINEs). To
be conservative, all interspersed repeats were considered to be
identical in sequence so that overlaps in reads that fell within
repetitive elements were useless for joining sequences. Many parameters
such as fold coverage of the genome, sequence read length, amount of
repetitive DNA, ratio of long to short inserts, and nucleotides of
overlap required to join sequences were varied in the simulations.
Default parameters (Table 2) are assumed to be in
force unless otherwise stated. The default value for LINE length was
conservatively chosen to be 1.5 kb, because although full-length LINE-1
(L1) elements are 6-7 kb in length, the vast majority of human L1
elements are truncated with average length ~0.7 kb (Smit et al.
1995
; A. Smit, pers. comm.). Note that the simulation does not solve an
assembly problem over simulated data, but instead analyzes the nature
of the sampling obtained. Details of the simulation, including source
code, can be obtained from Gene Myers (gene{at}cs.arizona.edu).
|
Two outcomes of the simulation, contig length and scaffold length, were
monitored particularly closely. Contigs are defined as sequence
assemblies without any discontinuities. Scaffolds (Roach et al. 1995
)
are defined as collections of two or more contigs joined by long
inserts whose ends are in different contigs. Scaffolds, by definition,
contain discontinuities, but the positions and approximate sizes of the
discontinuities are known. The simulation confirmed that coverage of
the genome is largely a function of the amount of raw sequence
generated (Lander and Waterman 1988
; Fleischmann et al. 1995
). As shown
in Table 3, the average simulated contig length
increased dramatically as the fold coverage of the genome increased
from 0.5 to 10. Average contig length was also dependent on the amount
of interspersed repetitive DNA and the ratio of long to short inserts
(Fig. 1). Increasing amounts of repetitive DNA led to
shorter average contigs. Even at 50% total repetitive DNA, however,
maximum contig length was still near 100 kb. When long-to-short insert
ratios were greater than 1, contig length was largely independent of
the ratio. These results were only modestly affected by read length
(from 200 to 800 bases) and by the minimum overlap required for
sequence joining (from 20 to 60 bases) (data not shown).
|
|
Given the large number of contigs that would be generated with the whole-genome shotgun approach, a pivotal question is whether the simulation contigs could be ordered into scaffolds. For a hypothetical human chromosome, 400 Mb in size, one scaffold spanning the entire chromosome length was obtained in each of 100 simulation iterations. After assembly, an average of 160 contigs and six small scaffolds remained unconnected to the single, very large scaffold (scaffolds can overlap without being connected by common sequence).
Using the default parameters, only ~16,000 gaps between contigs (0.04% of the genome) with average size of ~70 bp and maximum size <1700 bp remained after assembly. Although filling these gaps would certainly require a large effort, because the gaps are short, it should be possible to fill virtually all of them using PCR. Additional effort, if deemed necessary, would be required to sequence the complementary strand of segments with only single-strand coverage. Simulation results indicate that under default conditions, 616,000 of these single-stranded regions would exist with an average size of 106 bases.
Although a large amount of computing power would be required to perform
the sequence similarity searches necessary for assembly, such power is
already available. Using conservative and sensitive overlap detection
algorithms, it would currently be possible to span sequence-tagged
sites (STSs) spaced at 100 kb at a rate of at least one STS pair per
day per 100 mips (million instructions per second) workstation. With a cluster of
100 such workstations the assembly of the entire human genome would
take 300 days. By using less sensitive, but faster, overlap detection
software, this time could be reduced by nearly a factor of 10. Note
also that the power of computer processors has doubled every 18 months for many years, and this trend is likely to continue (Patterson 1995
).
If contemplated machines such as the 3-teraflop supercomputer planned
in 1998 for Lawrence Livermore National Laboratory (Macilwain 1996
)
were recruited to the task of assembly, then the human genome could be
assembled, in principle, in 4 min.
It is important to realize that because of significant progress in the
genetic and physical mapping of STSs (Olson et al. 1989
), the real task
of shotgun sequence assembly would be greatly simplified to the task of
building contigs and scaffolds that span adjacent STSs. Each of the
STSs would serve as a nucleation site for this linking process. Already
>30,000 total human STSs, including >16,000 genes, have been
physically mapped, and the tally is increasing rapidly (Cox et al.
1994
; Hudson et al. 1995
; Schuler et al. 1996
and Web sites listed
therein). Expressed sequence tags (ESTs) (Adams et al. 1991
, 1995
;
Hillier et al. 1996
) are particularly valuable for sequence assembly
because the coding sequences are often interrupted by introns. For the
purposes of assembly, a single EST will therefore usually be the
equivalent of an array of ordered STSs, a nearly ideal framework for
assembly. Plans to generate full-length cDNA sequences (Marshall 1996
)
will only enhance the utility of these sequences for assembly. Some genes like the dystrophin and neurofibromatosis I genes, for example, cover enormous segments of the genome (2.3 and 0.35 Mb, respectively) (Heim et al. 1995
; Prior et al. 1995
). Assuming, conservatively, a
total of 80,000 human ESTs and an average of three exons per sequence,
a grand total of >250,000 STSs with an average spacing of only 12 kb
is already available for assembly (Table 4).
|
At present, the process for human whole-genome shotgun sequence
assembly can only be projected. Nevertheless, a possible senario for
assembly would be to begin with all existing mapped STSs (including ESTs) within a specific chromosomal interval, to add shotgun reads in a
very conservative fashion utilizing only sequence overlaps of high
probability, to meld these growing assemblies to unmapped STSs within
the database, and then to add in lower probability overlapping
sequences. The sequence assemblies would continually be examined for
disagreements with EST structure or with existing map information and
also for the presence of forks or loops, which would indicate the
presence of unrecognized interspersed (forks) or tandem (loops)
repeats, or other errors in assembly or cloning artifacts. Software for
assembly on this scale does not exist, but we have begun work in this
direction. Our initial perception is that STS anchors provide
sufficient directional information to allow resolution of low copy
number repeats (of any scale) and that high copy number repeats can be
factored as a consensus sequence that can be resolved at specific sites
on a case-by-case basis. The development of such software poses
difficult technical questions, but we believe these are surmountable in
a several man-year horizon. We note, for example, that human coding
sequences have been assembled from individual reads by several groups
despite the presence of sequence errors, polymorphisms, alternative
splicing, and repetitive elements (Schuler et al. 1996
). Also, software developed for assembly of human sequences would be applied in the
future to many other organisms.
Whole-genome shotgun sequencing would not result in a single unbroken
sequence for entire chromosomes. Even using recombination and
restriction-deficient E. coli strains (Chalker et al. 1988
; Raleigh et al. 1988
; Doherty et al. 1993
), a small portion of the
genome would likely be resistant to cloning or would not yield stable
clones. Sequences from long arrays of tandem repeats such as
centromeric satellite DNA, rDNA repeats, and some minisatellites would
not be able to be assembled perfectly. Note, however, that these
limitations apply to both whole-genome shotgun and clone-by-clone sequencing approaches.
The feasibility of whole-genome shotgun sequencing was also supported
by the recent success achieved by Venter and colleagues in sequencing
three bacterial genomes with sizes ranging from 0.6 to 1.8 Mb
(Fleischman et al. 1995; Fraser et al. 1995
; Bult et al. 1996
). Neither
raw sequence generation, sequence assembly, nor sequence finishing was
an impediment to the shotgun sequencing of the bacterial chromosomes.
Distances between human STSs are much smaller than the sizes of the
bacterial genomes.
Our strategy for whole-genome shotgun sequencing is also entirely
consistent with the bacterial artificial chromosome (BAC) end
sequencing strategy proposed recently by Venter et al. (1996)
. Although
we feel that large-scale BAC end sequencing would probably not be
absolutely required, it would certainly assist in the assembly of the
shotgun sequence fragments. BAC clones would likely span some arrays of
tandem repeats that are too large for our "long insert" clones.
Advantages of Whole-Genome Shotgun Sequencing
Whole-genome shotgun sequencing of human genomic DNA holds a number of important advantages compared to conventional clone-by-clone sequencing. Foremost among these advantages are detection of large numbers of DNA polymorphisms, more complete and less artifactual coverage of the genome, and improved speed and cost.
A significant fraction of all common human DNA polymorphisms can be
detected through shotgun sequencing. Polymorphisms are important
because they are used to map genes through linkage analysis (Terwilliger and Ott, 1994
), to presymptomatically predict disease status (Antonarakis 1989
; Weber 1994
), to detect submicroscopic chromosomal rearrangements (Lupski et al. 1991
), to identify
individuals in, for example, paternity and forensic testing (Hagelberg
et al. 1991
; Frigeau and Fourney 1993; Smith 1995
; Urquhart et al. 1995
), and to study a wide range of biological phenomena such as
evolution (Bowcock and Cavalli-Sforza 1991
; Bowcock et al. 1994
; Jorde
et al. 1995
), population biology (Edwards et al. 1992
; Deka et al.
1995
; Morell et al. 1995
), and recombination (Tanzi et al. 1992
; Weber
et al. 1993
). Polymorphisms within coding and regulatory elements are
also the source of relative risk for many common diseases. Common
variants of the apolipoprotein E gene on chromosome 19, for example,
strongly influence an individual's risk of developing late onset
Alzheimer's disease (Saunders et al. 1993
; Kamboh 1995
; Kamboh et al.
1995
). Many highly informative human DNA polymorphisms based on short
tandem repeats have already been identified, but the vast majority of
the much more frequent biallelic base substitution and short
insertion/deletion polymorphisms remain unknown (Kwok et al. 1994
,
1996
). Although allele frequencies vary widely, most human DNA
polymorphisms are common to all populations (Bowcock and Cavalli-Sforza
1991
; Jorde et al. 1995
; Bowcock et al. 1994
; Deka et al. 1995
; Edwards
et al. 1992
; Morell et al. 1995
).
DNA polymorphisms would not usually be detected through clone-by-clone sequencing because only one variant for each genomic region would be sampled. If the genome is sequenced through the clone-by-clone approach, then much additional funding would be required to identify the polymorphisms at a later date and many years would be lost. Calculation of the exact fraction of polymorphisms that would be identified through whole-genome shotgun sequencing requires a distribution of polymorphisms as a function of informativeness, which is not yet known. However by generating 6 billion nucleotides of raw sequence from each of five unrelated individuals, it can be calculated that ~65% of all 20% heterozygosity biallelic polymorphisms and >99% of all 80% multiallelic polymorphisms would, for example, be detected. To optimize polymorphism detection, DNA should ideally be sequenced from donors with widely differing geographic ancestry.
Sequencing errors would likely be encountered much more frequently in
whole-genome shotgun sequencing than true polymorphisms. Sequencing
error rates would likely be at least 1%, whereas the rate of
polymorphisms would likely be on the order of 0.1%. Although confirmation may be necessary in many cases, several factors would allow many of the polymorphisms to be identified despite the background of sequencing errors. True polymorphisms would often have multiple sequence reads per allele, true polymorphisms would usually have high-quality values attached to each allele, and true polymorphisms do
not occur randomly thoughout the genome. Specific sequence features
will spotlight polymorphisms. For example, it has been known for many
years that CpG dinucleotides are more commonly polymorphic than other
dinucleotides (Schumm et al. 1988
; Deininger and Batzer 1993
; Becker et
al. 1996
; Sommer and Ketterling 1996
).
Rearrangements in the large insert contig clones and biases in the
coverage of these clones will, to a large degree, be eliminated by
whole-genome shotgun sequencing. Many of the cosmid clones projected
for use in sequencing have been developed from hybrid tissue culture
cell lines which, themselves, have been propagated for many cell
generations. Rearrangements and artifacts have undoubtedly been
introduced into the cloned material during this process. Although
BACs/PACs (P1-derived artificial
chromosomes) appear to be more stable than cosmids,
artifacts such as chimeras and deletions still occur at a significant
frequency (Kim et al. 1996
; Boysen et al. 1997
). By starting with total
human genomic DNA, many of these artifacts will be eliminated. The
cosmid or BAC/PAC assemblies will also likely exclude at least some
long arrays of tandem repeats. The genome will be more equally
represented with shotgun sequencing using small inserts. In addition,
overlaps between large insert clones will lead to largely unproductive duplicative sequencing or to the expenditure of resources to avoid this
duplication.
Whole-genome shotgun sequencing would also be less expensive and
therefore faster than the clone-by-clone approach. The steps of
preparation, mapping, storage, and tracking of tens of thousands of
sequence-ready large-insert clones; parallel generation, storage and
tracking of subclones for each of the large insert clones; and
avoidance of large-insert clone overlap would be entirely eliminated
with shotgun sequencing. The processes of sequence assembly and
sequence finishing could be carried out much more efficiently in
central facilities. Reducing the process of DNA sequencing to the core
task of raw sequence generation would also allow efforts to be focused
on driving down the costs of a few relatively straightforward
procedures in large factory-like operations. With shotgun sequencing
there would be no need to wait for expensive, sequence-ready
large-insert clone assemblies to be generated and no need to sequence
one chromosome or one chromosomal segment at a time. To date, no one
has generated overlapping cosmid or BAC/PAC assemblies that span even
significant portions of human chromosomes without many gaps (Ashworth
et al. 1995
; Doggett et al. 1995
). Perhaps this can be accomplished
eventually but only through great effort, time, and cost. The assertion
that collection of large-insert templates for sequencing is trivial is
simply wrong. Although initiation of genome-wide sequence assembly
would probably not be worthwhile until ~2.5-fold sequence coverage
was obtained, completion of partial cDNA sequences, identification of
regulatory regions, definition of intron/exon boundaries, and identification of polymorphisms are all tasks that could be undertaken continuously from the start of shotgun sequence generation. The large
number of laboratories worldwide undertaking position cloning projects,
for example, could utilize the shotgun sequences from the outset.
Estimating the actual costs of human genomic sequencing is certainly hazardous. Nevertheless, our best effort is summarized in Table 5. Assuming optimistically that clone-by-clone sequencing of human DNA can be completed for $0.30 per finished base, and assuming that sequencing is completed by the end of the year 2003, an average cost per year of $130 million is calculated. Assuming conservatively a cost of $0.01 for generation of a single base of raw sequence, spending of $130 million per year would give 10-fold coverage by about the end of the millennium with $90 million remaining for software development and computer assembly. Filling gaps and resolving uncertainties would add additional costs to whole-genome shotgun sequencing in the next century.
|
We assert that the goals listed in Table 1 are the true motivation for
sequencing the human genome, not the accomplishment of some arbitrary,
mythical goal of 99.99% accuracy of a single, artifactual (in places)
and nonrepresentative copy of the genome. Most research laboratories,
both public and private, want discrete genomic sequence information,
and they want it as early as possible. They are interested in
information such as the intron/exon structure of specific genes, the
polymorphisms that may occur in specific coding and regulatory
sequences, and lists of coding sequences that lie within specific
chromosomal intervals. The sooner this critical information is
available, the sooner it can be applied to accelerating research
progress. Americans spend ~$35 billion per year, public and private,
on biomedical research (Silverstein et al. 1995
). If the efficiency of
this research is improved by even 1%, and this is probably a gross
underestimate, then savings would be $350 million per year, far more
than the cost of sequencing. Whole-genome shotgun sequencing will allow
these savings to be realized far sooner than with clone-by-clone
sequencing. We should generate as much of the critical sequence
information as rapidly as possible and leave cleanup of gaps and
problematic regions for future years.
It is not too late to change strategies for sequencing the human genome. Only a few percent of the sequence has been generated at this time. Even if the human genome is not sequenced via the shotgun approach, there are still many other large genomes that will be sequenced in the future, including many agriculturally important species. It will likely be too expensive to sequence other large genomes via the clone-by-clone approach. A possible general strategy for sequencing other large genomes would be a random cDNA sequencing project, followed possibly by some radiation hybrid physical mapping of the ESTs, followed by whole-genome shotgunning.
About a decade ago, when the Genome Project was just being contemplated, Fred Blattner proposed whole-genome shotgun sequencing of both the E. coli and human genomes. His proposals were neglected. Today, no one considers for a moment sequencing bacterial genomes by any method other than whole-genome shotgun sequencing. Even at several dollars per finished base the human sequence is probably one of the greatest bargains in human history. We laud efforts now under way in several large sequencing centers to generate human genomic sequence. The reality, however, is that research dollars are always limited. We should sequence the human and other eukaryotic genomes using the most rapid, cost effective, and productive strategy.
| |
FOOTNOTES |
|---|
3 Corresponding author.
E-MAIL weberj{at}mfldclin.edu; FAX (715) 389-3808.
| |
REFERENCES |
|---|
|
|
|---|
/
T-cell receptor locus with bacterial artificial chromosome clones.
Genome Res.
7:
330-338 [Abstract] [Medline].
1-antichymotrypsin polymorphism.
Nature Genet.
10:
486-488 [Medline].[CrossRef][Medline]
4 with late-onset familial and sporadic Alzheimer's disease.
Neurology
43:
1467-1472 [Medline].This article has been cited by other articles:
![]() |
P. Green 2x genomes Does depth matter? Genome Res., November 1, 2007; 17(11): 1547 - 1549. [Full Text] [PDF] |
||||
![]() |
N. H. Putnam, M. Srivastava, U. Hellsten, B. Dirks, J. Chapman, A. Salamov, A. Terry, H. Shapiro, E. Lindquist, V. V. Kapitonov, et al. Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization Science, July 6, 2007; 317(5834): 86 - 94. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ashburner and C. M. Bergman Drosophila melanogaster: A case study of a model genomic sequence and its consequences Genome Res., December 1, 2005; 15(12): 1661 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Metzker Emerging technologies in DNA sequencing Genome Res., December 1, 2005; 15(12): 1767 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Bailey, D. M. Church, M. Ventura, M. Rocchi, and E. E. Eichler Analysis of Segmental Duplications and Genome Assembly in the Mouse Genome Res., May 1, 2004; 14(5): 789 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Braslavsky, B. Hebert, E. Kartalov, and S. R. Quake Sequence information can be obtained from single DNA molecules PNAS, April 1, 2003; 100(7): 3960 - 3964. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-C. Nicod and C. R. Largiader SNPs by AFLP (SBA): a rapid SNP isolation strategy for non-model organisms Nucleic Acids Res., March 1, 2003; 31(5): e19 - e19. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Mullikin and Z. Ning The Phusion Assembler Genome Res., January 1, 2003; 13(1): 81 - 90. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wang, G. K.-S. Wong, P. Ni, Y. Han, X. Huang, J. Zhang, C. Ye, Y. Zhang, J. Hu, K. Zhang, et al. RePS: A Sequence Assembler That Masks Exact Repeats Identified from the Shotgun Data Genome Res., May 1, 2002; 12(5): 824 - 831. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Waterston, E. S. Lander, and J. E. Sulston On the sequencing of the human genome PNAS, March 19, 2002; 99(6): 3712 - 3716. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-W. Cai, R. Chen, R. A. Gibbs, and A. Bradley A Clone-Array Pooled Shotgun Strategy for Sequencing Large Genomes Genome Res., October 1, 2001; 11(10): 1619 - 1623. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Pevzner Assembling Puzzles from Preassembled Blocks Genome Res., September 1, 2001; 11(9): 1461 - 1462. [Full Text] [PDF] |
||||
![]() |
P. A. Pevzner, H. Tang, and M. S. Waterman An Eulerian path approach to DNA fragment assembly PNAS, August 14, 2001; 98(17): 9748 - 9753. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, et al. The Sequence of the Human Genome Science, February 16, 2001; 291(5507): 1304 - 1351. [Abstract] [Full Text] |
||||
![]() |
D. D. Pollock, J. A. Eisen, N. A. Doggett, and M. P. Cummings A Case for Evolutionary Genomics and the Comprehensive Examination of Sequence Biodiversity Mol. Biol. Evol., December 1, 2000; 17(12): 1776 - 1788. [Abstract] [Full Text] [PDF] |
||||
![]() |
|