|
|
|
|
|
Genome Research
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
The highly specific and sensitive PCR provides the basis for sequence-tagged sites (STSs), unique landmarks that have been used widely in the construction of genetic and physical maps of the human genome. Electronic PCR (e-PCR) refers to the process of recovering these unique sites in DNA sequences by searching for subsequences that closely match the PCR primers and have the correct order, orientation, and spacing that they could plausibly prime the amplification of a PCR product of the correct molecular weight. A software tool was developed to provide an efficient implementation of this search strategy and allow the sort of en masse searching that is required for modern genome analysis. Some sample searches were performed to demonstrate a number of factors that can affect the likelihood of obtaining a match. Analysis of one large sequence database record revealed the presence of several microsatellite and gene-based markers and allowed the exact base-pair distances among them to be calculated. This example provides a demonstration of how e-PCR can be used to integrate the growing body of genomic sequence data with existing maps, reveal relationships among markers that existed previously on different maps, and correlate genetic distances with physical distances.
| |
INTRODUCTION |
|---|
|
|
|---|
In recent years mapping strategies have focused
on the use of sequence-tagged sites (STSs) as landmarks of the genome
(Olson et al. 1989
). Operationally, an STS is defined by a pair of
oligonucleotide primers that can be used in a PCR assay to detect a
site that is unique in the genome. In some cases, the size of the
amplified PCR product may be polymorphic, which allows the transmission of allelic variants within families to be studied. This property is
essential for STSs used in genetic mapping, whereas any STS can be used
for physical mapping. The chief advantage of STSs over other types of
markers is that there is no absolute requirement to maintain and
distribute any biological materials. Instead, markers can be stored in
computer databases and disseminated over electronic networks. This is
due to the fact that it is easy and relatively inexpensive to
synthesize oligonucleotides, thereby allowing any laboratory around the
world to regenerate the necessary reagents to assay for a given marker.
Because STSs are defined by sequence, it is possible to identify these landmarks in DNA sequences by searching for subsequences of a query sequence that match the PCR primers and are in the correct order, orientation, and spacing to be consistent with the PCR product size. We call this procedure electronic PCR (e-PCR). The significance of this technique can be seen by considering that it is possible to determine the map location of a new sequence without performing a single experiment in the laboratory. This report describes a software tool for performing e-PCR in an efficient manner and discusses several potential applications for genomic research.
Sources of STS Data
The STS division of GenBank (dbSTS) is used within the mapping
community for bulk submission of STS sequences, PCR reaction conditions, and mapping information (Benson et al. 1996
). Figure 1a
shows a few database fields from a typical record that are of primary
interest for use in e-PCR: various names and
identifiers, the sequences of the forward and reverse primers, the size
of the PCR product, and the full sequence of the amplified region (the
amplicon). Although many of the STSs in the database are of human
origin, several other model organisms are represented by smaller
numbers of STSs. Additional sources of human STS data include the
Genome Data Base (Fasman et al. 1996
) and the Radiation Hybrid Database
(http://www.ebi.ac.uk/RHdb/).
|
Within the last 2 years, several "whole-genome" STS-based human
mapping projects have been described. One significant milestone is the
recent completion of the Généthon genetic map (Dib et al.
1996
), which contains 5264 STSs developed from microsatellite sequences. Several physical maps have been published, such as those
developed by the Centre d'Etude du Polymorphisme Humain (CEPH), which
contain 2601 markers (Chumakov et al. 1995
); the Whitehead Institute
and Généthon, which contain 15,086 markers (Hudson et al.
1995
); Généthon and Cambridge University, which contain 850 markers (Gyapay et al. 1996
); and Stanford University, which contains
5994 markers (Stewart et al., this issue
). Finally, a transcript map,
which contains 20,128 cDNA-based markers representing ~16,000
distinct genes, has recently been constructed by an international consortium of mapping laboratories (Schuler et al. 1996
). Projects such
as these, not to mention many chromosome-specific and regional maps,
have resulted in a substantial expansion in the number of STSs in the
database. Consider, for example, that the GenBank STS division
contained 7532 entries when it first appeared in October 1994 while the
number has swelled to 44,102 sequences in the December 1996 release
a
roughly 6-fold increase in a period of just over 2 years.
Software for e-PCR
The process of PCR is difficult to model in detail because a
variety of poorly understood factors affect whether or not a particular
pair of primers will lead to a successful outcome (Bangham 1991
).
Despite this, a computational strategy for identifying the most obvious
STSs would be extremely useful. A straightforward attempt would involve
searching for subsequences exactly matching the two primers and then
checking to see that they are in the correct configuration (i.e., with
their 3
ends pointing toward one another) and that their spacing
is consistent with the known size of PCR product. For increased
sensitivity, some allowance could be made for mismatches in the primer
sites and polymorphism in the amplicon size.
One possible approach, which makes use of widely available software
(both public domain and commercial), is to specify each STS as a
pattern, or "regular expression," with the two primer sequences
separated by a (possibly variable-length) spacer of arbitrary
characters. It should be noted that STS databases store both primers in
their 5
3
orientations, which is perfectly natural
for those attempting to synthesize the oligonucleotides but
inconvenient for the purposes of sequence analysis because the reverse
primer must be inverted before attempting to match it against the
sequence (see Fig. 1b). Moreover, it is usually necessary to construct
two expressions per STS, one for each of the DNA strands, because most
regular expression search programs consider only one strand at a time.
Using "standard" regular expression syntax, it is not possible to
specify the exact length of the spacer between the primers, although
some programs may use an extended syntax to allow this. Search speed
may also be an issue in situations where it is desirable to search
large collections of sequences against the STS database. Regular
expression programs are designed for flexibility and the ability to
handle sophisticated expressions and may not be maximally efficient
when searching for the relatively simple patterns required by e-PCR.
For both convenience and performance considerations, it was useful to develop a special-purpose program for performing e-PCR. This program (simply called e-PCR) requires one file containing the PCR primers and amplicon sizes for the STSs of interest and one file containing arbitrarily large numbers of query sequences to be searched en masse. To be reported, matches to both primers must be found and the order and orientation of the primer sequences must be consistent with their role in priming the PCR reaction. That is, either the forward primer must be followed by the inverse of the reverse primer (for a plus strand hit) or the reverse primer must be followed by the inverse of the forward primer (for a minus strand hit; see also Fig. 1b). The portion of the amplicon falling between the primers is not considered when evaluating a match.
A word-based strategy is used to significantly speed up the search for
the primers within the query sequence. This is done by extracting a
string of W consecutive letters (a "word") from the
3
end of each primer and converting it to a unique integer (a
"hash value"). Using the hash value for indexed access to a table
of STS information allows potential matches to be evaluated efficiently
as the query sequence is scanned. Candidate STSs are recognized as
occurrences of the two words taken from the forward and reverse primers
that are spaced such that the predicted amplicon size is within a
"margin" of M bases on either side of the expected size.
When these criteria are satisfied, a secondary comparison is triggered
in which the complete primers are compared against the query sequence,
allowing up to N mismatching bases for each primer. In the
default mode of operation, no mismatches are allowed
(N = 0), which allows the program to report only sites
that are certain to be authentic STSs. This is useful for automated
analysis of large volumes of data because the results need not be
inspected manually. However, it should be recognized that some STSs
could be missed either because the primers do not match exactly or
because the query sequence may contain errors. Increasing the value of
N will allow more potential STSs to be found, but doing so may
also result in some false positives being reported. It should be noted
that when mismatches are allowed, they may not be within the W
bases used to compute the hash value. This apparent limitation is
justified by the fact that words are extracted from the 3
ends of
the primers where mismatches cannot be tolerated easily by PCR (Sommer
and Tautz 1989
). The value of W can be reduced from its default
value of 7 to reduce the chance of missing a true STS, but speed is sacrificed in the process. The default value of M is 50, which
should accommodate length variations for most polymorphic STSs.
However, a larger setting might be desired for certain applications, for instance, to handle the case where primers are designed from cDNA
sequences but turn out to be in different exons in genomic DNA.
The e-PCR program is sufficiently rapid that on one common computer architecture it was possible to test all human sequences in GenBank (excluding the STS division; 609,257 sequences) against all of the human STSs (36,973 primer pairs) in <1 hr. The running time scales linearly with the aggregate query sequence length. The memory requirement is modest and increases linearly with the number of STSs searched. The source code for the e-PCR program is freely available (ftp://ncbi.nlm.nih.gov/pub/schuler/e-PCR/).
Why Not Just Use BLAST?
One might wonder why a special search tool need be created at all,
considering the widespread availability of general-purpose database
search tools such as BLAST (Altschul et al. 1990
). dbSTS is one of the
standard databases available for searching when using the BLAST network
service. However, attempting to use BLAST to identify STSs can lead to
many false positives in some cases. With gene-based STSs for instance,
there will be sufficient sequence similarity among related gene family
members and pseudogenes to result in confounding matches being
reported. But perhaps the worst case is encountered when the query
sequence contains simple sequence repeats, such as those that are the
basis for most polymorphic markers.
To demonstrate the problem posed by repetitive sequences, the mRNA
sequence of Br-cadherin (GenBank accession no. L33477) (Selig et al.
1995
) was used as the query in a BLAST search against the dbSTS
database. This sequence happens to contain a (CA)n
microsatellite sequence in its 3
-untranslated region (3
UTR)
that corresponds to the Généthon marker D5S411. The output
from this search was quite voluminous, but a portion of it has been
reproduced in Figure 2. The best match was to the
sequence corresponding to D5S411 (GenBank accession no. Z16831), but
thousands of additional hits were also observed, including a few to
sequences from organisms other than human (by default BLAST shows only
the first 500 hits, but relaxing this limit resulted in a list of
>8000). A list of the best 20 matches is shown in Figure 2a. All but
the first one are false positives, showing sequence similarity only to
the (CA)n repeats and not to any flanking unique sequence
(see Fig. 2b for one example). The problem of simple sequence repeats
(also known as "low-complexity regions") causing false positives
in database searches has been noted previously and is dealt with
typically by "masking" such regions (by converting them to Ns)
prior to performing the search (Altschul et al. 1994
; Wootton and
Federhen 1996
). Although this does reduce the problem, it does not
eliminate it completely so some manual inspection of the results must
still be performed.
|
It should be noted that BLAST was designed to solve the somewhat
different problem of finding sequences related to the query, allowing
for some level of mismatching, but extending over a long enough region
that there is sufficient information to distinguish an observed
similarity from a chance occurrence. BLAST is more analogous to
"electronic hybridization," with the scoring parameters taking
the place of the hybridization temperature in determining stringency.
Extending the sequence comparison to the longer sequence of the
amplicon, instead of focusing on the PCR primers alone, results in a
loss of specificity
just as hybridization is less specific than PCR in
the laboratory.
How Many Hits Can Be Expected?
An important question for users of the e-PCR tool is how many hits can be anticipated for a "typical" search. This is a difficult question to answer because the results depend on many factors such as the length of the query sequence, the size of the STS database, and other properties of both the query sequences and the STSs that might predispose them to matching.
When using STSs that are randomly distributed throughout the genome, it
is easy to see that longer query sequences would have an increased
likelihood of containing a matching STS than would shorter sequences.
To demonstrate this effect, human genomic sequences of various size
classes were tested for the presence of microsatellite-based STSs from
the Généthon genetic map (Dib et al. 1996
). As expected, larger sequences were found to have proportionately greater frequencies of containing sites (see Table 1). For a sequence in
the 30- to 40-kb size range (about the size of a typical cosmid
insert), 5% of the sequences were found to contain a
Généthon marker. This increases steadily with sequence
length to a level of 34% for a sequence >100 kb (a typical size
range for bacterial artificial chromosome inserts). This is roughly
consistent with expectations: A random arrangement of 5264 markers (Dib
et al. 1996
) over a genome of 3200 Mb (Morton 1991
) should result in
~1.65 sites per Mb; on average, 1 STS every 608 kb. This suggests
about a 1:6 chance of a 100-kb sequence containing a site or about
1:3 chance for a 200-kb sequence. In one instance, a 223-kb
sequence (GenBank accession no. U47924) was found to contain two
Généthon markers. The apparent number of sites per megabase
was lower with shorter sequences, but this is most likely attributable
to a bias in this fraction toward single gene-oriented entries in this
fraction, as opposed to random large-insert clones that predominate the larger size categories; microsatellites may be less likely to occur
within genes than in random DNA.
|
Dependencies on sequence length and database size may be obvious, but
factors such as source of the material can be of even greater
consequence. For example, large numbers of STSs have been developed
from transcribed sequences (Schuler et al. 1996
). Thus, data sources
that are enriched for gene sequences, cDNAs for instance, would be
expected to have an increased likelihood of bearing a match. To
illustrate this effect, several categories of human sequences were
compared against databases consisting of STSs derived from random DNA
fragments, microsatellite repeats, and transcribed sequences (see Table
2). The genomic sequences used in this test are the
same as those described above (Table 1) and, when normalized for the
differing sizes of the STS collections, show nearly identical hit rates
for all three types of STSs. As expected, however, sources of sequence
data derived from mRNAs, including single-pass expressed sequence tag
(EST) sequences, show substantially higher numbers of matches with
transcript-derived STSs compared to those derived from random DNA
fragments and microsatellite repeats. The most pronounced effect was
seen when searching with EST sequences from 3
reads of
oligo(dT)-primed cDNA clones, in which the normalized hit rate was
about three orders of magnitude greater than was observed with random
STSs. Furthermore, the numbers of transcript sites for 3
ESTs were
much greater than for 5
ESTs, whereas no significant difference
between these two sequence categories was observed for random and
microsatellite markers. All of these observations can be explained by
the fact that 3
ESTs have been the primary source of material used
in development of the transcript-based STSs.
|
An Example Application
GenBank entry U47924 was chosen to illustrate some practical
applications of e-PCR. As noted above, this 223-kb sequence contains
two Généthon markers. It originates from a gene-rich region
at 12p13 and contains 17 complete protein-coding genes (plus one
partial gene, one pseudogene, and one snRNA gene) (Ansari-Lari et al.
1996
). Several of them are novel, but four of them correspond to
previously known chromosome 12 genes: cell-surface antigen CD4, the B3 subunit of G proteins (GNB3), triose
phosphate isomerase (TPI), and ubiquitin isopeptidase T
(ISOT). The results of analyzing this sequence by e-PCR using
a database consisting of all STSs from the human transcript map
(Schuler et al. 1996
) and the Généthon genetic map (Dib et
al. 1996
) are shown in Table 3. Hits to
Généthon markers D12S1623 and D12S1625 reveal two
polymorphic sites and firmly place the sequence on chromosome 12 at a
genetic position of 17.1-17.9 cM (see Fig. 3).
Furthermore, assuming the marker order of the Généthon map
to be correct, the orientation of the U47924 sequence with respect to
the centromere and the 12p telomere can be established. Among the nine
gene-based markers detected, all but one are consistent with the
chromosome 12 assignment, and they additionally indicate the positions
of expressed genes. In Figure 3a, the locations of the gene coding
sequences and the direction of transcription are indicated. It can be
seen that the sites reported for the transcript-based STSs very often
correspond to the 3
ends of genes, which is consistent with the
strategy that was used in the development of these markers (Schuler et al. 1996
). In the case of the CD4 gene, two markers
(SHGC-12737 and A007D38) were found, spaced ~1 kb apart but still
within the 3
UTR. Overall, 8 of the 17 genes in this region
contained a match to a marker from the human transcript map. The
results of this analysis suggest that e-PCR could be used for automated
sequence annotation. It should be noted that the annotation of GenBank entry U47924 does include the locations of Généthon markers D12S1263 and D12S1265 (using alternate identifiers) but understandably lacks the transcript-based markers that were not published until after
the sequence was submitted.
|
|
Looking at the problem from the other point of view, analysis of the
sequence provides a useful way to validate the map, at least in a
localized region. In the construction of the human transcript map,
participating laboratories assigned gene-based STSs to various
intervals defined by Généthon microsatellite markers to
allow the results to be integrated with each other and with the genetic
map (Schuler et al. 1996
). Based on the e-PCR analysis of accession no.
U47924, the true interval for all of the transcript markers shown in
Table 3 is between D12S1623 and D12S1625. It was therefore of interest
to see whether the transcript map positions reported for these markers
were consistent with this observation. One error is clearly apparent
involving the interval for marker WI-9250, which is listed as being on
chromosome 1, whereas the remaining body of evidence points to a
chromosome 12 assignment. It is perhaps noteworthy that this marker is
the only one in the region to have been placed by yeast artificial chromosome (YAC) contig mapping; all of the others in this region were
mapped using radiation hybrid panels. Consequently, this inconsistency
may be of use in diagnosing mapping artifacts, for instance, those
caused by doubly chimeric YACs. However, apart from this one error, the
intervals reported for the remaining cDNA markers are all correct,
albeit at lower resolution than can be deduced by sequence analysis.
This may be seen by comparing the intervals in Table 3 with the portion
of the Généthon map of chromosome 12 shown in Figure 3b.
| |
DISCUSSION |
|---|
|
|
|---|
This report describes the concept of e-PCR and a software tool that provides an efficient implementation of the basic search strategy. The number of expected STS hits depends on a variety of factors, but those that are reported (using the default parameters) are unequivocal. This differs from the experience using BLAST, in which many false positives were reported. The use of this program in the analysis of one large sequence record demonstrated some practical applications of the program, but several others may be envisioned.
One straightforward application of e-PCR is the large-scale assignment of sequence database records to map positions. This is especially useful for functionally cloned genes and ESTs, for which mapping information may not be initially available. In the case of large-scale genomic sequencing, it is common to use a "sequence-ready map" to select large-insert clones for sequencing so that to some extent the map position will be known in advance. Nonetheless, it is always encouraging to verify that STSs that should be present can be detected in the final sequence.
To simplify the process and make it more widely available, an e-PCR search facility recently has been added to the National Center for Biotechnology Information (NCBI) site on the World Wide Web (http://www.ncbi.nlm.nih.gov/cgi-bin/STS/nph-sts). It allows the user to insert one or more DNA sequences, which are then compared against all PCR primer pairs in dbSTS. In the output of such a search, nucleotide positions of the STSs within the query sequence are given, together with expected and observed amplicon sizes, marker names, and chromosome numbers. Hypertext links to GenBank and dbSTS records are provided for more detailed mapping information and PCR reaction conditions.
When developing new markers for mapping studies, e-PCR can be used to test potential primers in various ways before actually incurring the expense of oligonucleotide synthesis. For instance, one could determine whether a new STS is essentially a duplicate of one already in hand by searching the sequence database for entries that contain them both in close proximity. In addition, potential cross-reactivity with other members of a gene family (which would violate uniqueness of the site in human) or in their rodent homologs (which would be a concern for mapping techniques involving somatic hybrids with rodent cells) could be tested, provided that the relevant sequences are available. One common source of mapping failures is unwittingly selecting PCR primers in repetitive DNA. Although it would be trivial to screen candidate primers against a database of known repeats, these collections are likely to be incomplete considering that new classes of repeats are continually being discovered. An alternative test would be to match the proposed primers against all human genomic sequences in the database to determine whether the hits are greater in number than expected on statistical grounds or involve sequences from several different chromosomes.
With the accelerating pace of large-scale genomic sequencing, there is
significant interest in methods for annotating sequences that can be
fully automated. Most of the attention has, appropriately, been focused
on predicting genes. However, annotating the locations of STSs would
also be a valuable activity. In addition to the fact that they are
established landmarks of the genome, some STSs provide additional
information because they have been developed from specific sequence
sources, such as microsatellites (which indicate polymorphic sites),
CpG islands (which are often associated with the 5
ends of genes),
and 3
UTRs (which mark the 3
gene boundaries). Moreover, the
use of e-PCR for STS annotation would be quite easy to automate because
the unequivocal nature of the results obviates the need for human
intervention.
In this study, localized map validation was demonstrated using e-PCR analysis of a single GenBank entry, but this will become increasingly feasible to do in a more widespread fashion as it becomes more common to have large sequence contigs spanning perhaps 1 Mb or more. This is somewhat analogous to common validation practices involving comparisons of restriction maps determined experimentally with those generated by computer analysis of the sequence. Furthermore, it will be possible, at least in localized regions, to integrate different STS-based maps with each other. Traditionally, map integration has been made difficult because of an insufficient number of markers that are shared across all maps. But once the sequence becomes known, this ceases to be a problem because all STSs will be detectable in the sequence regardless of the source.
One might argue that the completion of the human genomic sequence will make today's physical maps obsolete. But this is not true of genetic maps, which are the starting point for localizing disease susceptibility and other phenotypes to specific chromosomal regions. Thus, integrating genetic maps and sequence data will be of considerable interest for years to come. Moreover, identifying the positions of genetic markers in sequences will reveal the precise physical distances corresponding to genetic intervals, allowing "hot" and "cold" spots of meiotic recombination to be discerned.
Even before the complete sequence of the human genome is known, we can begin to assemble a composite sequence map consisting of islands of sequence tethered to physical and genetic maps. This is precisely the strategy used in the construction of the human sequence map presented in the Entrez Genomes division (http://www3.ncbi.nlm.nih.gov/Entrez/). By making use of positional data from two physical maps (from the Whitehead Institute and Stanford University) and two genetic maps (from Généthon and the Cooperative Human Linkage Consortium), GenBank sequences have been anchored using STSs identified by e-PCR. It can be anticipated that this technique will continue to play a role in assembly and validation of the human genomic sequence as the Human Genome Project approaches completion.
| |
ACKNOWLEDGMENTS |
|---|
Special thanks go to Eric Green and Gerry Bouffard for helpful discussions and to Mark Boguski for critical review of the manuscript. Many suggestions for improvement of the software have been provided by Jinghui Zhang, Sergei Shavirin, and Gerry Bouffard. The NCBI's World Wide Web resource for performing e-PCR against dbSTS was developed by Sergei Shavirin. Maintenance of the dbSTS database is performed by Jane Weissman and Carolyn Tolstoschev. The integrated genetic, physical, and sequence maps in the Entrez Genomes division are maintained by Jinghui Zhang.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
1 E-MAIL schuler{at}ncbi.nlm.nih.gov; FAX (301) 480-9241.
| |
REFERENCES |
|---|
|
|
|---|
Received December 27, 1996; accepted in revised form February 28, 1997.
This article has been cited by other articles:
![]() |
F. Panzitta, A. Caprera, I. Merelli, L. Milanesi, J. L. Williams, B. Lazzari, and A. Stella Mining the Bovine Genome with the "Bovine SNP Retriever" J. Hered., June 9, 2008; (2008) esn044v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Huang, E. D. Pleasance, J. S. Maydan, R. Hunt-Newbury, N. J. O'Neil, A. Mah, D. L. Baillie, M. A. Marra, D. G. Moerman, and S. J.M. Jones Identification and analysis of internal promoters in Caenorhabditis elegans operons Genome Res., October 1, 2007; 17(10): 1478 - 1485. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Ou, X. He, E. M. Harrison, B. R. Kulasekara, A. B. Thani, A. Kadioglu, S. Lory, J. C. D. Hinton, M. R. Barer, Z. Deng, et al. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands Nucleic Acids Res., July 13, 2007; 35(suppl_2): W97 - W104. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Follows, P. Dhami, B. Gottgens, A. W. Bruce, P. J. Campbell, S. C. Dillon, A. M. Smith, C. Koch, I. J. Donaldson, M. A. Scott, et al. Identifying gene regulatory elements by genomic microarray mapping of DNaseI hypersensitive sites Genome Res., October 1, 2006; 16(10): 1310 - 1319. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Ou, L.-L. Chen, J. Lonnen, R. R. Chaudhuri, A. B. Thani, R. Smith, N. J. Garton, J. Hinton, M. Pallen, M. R. Barer, et al. A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria Nucleic Acids Res., January 9, 2006; 34(1): e3 - e3. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Katari, V. Balija, R. K. Wilson, R. A. Martienssen, and W. R. McCombie Comparing low coverage random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for their ability to add to the annotation of Arabidopsis thaliana Genome Res., April 1, 2005; 15(4): 496 - 504. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wang, X. Zhao, J. Zhu, and W. Wu Genome-wide Investigation of Intron Length Polymorphisms and Their Potential as Molecular Markers in Rice (Oryza sativa L.) DNA Res, January 1, 2005; 12(6): 417 - 427. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Chen, T. W. Harris, I. Antoshechkin, C. Bastiani, T. Bieri, D. Blasiar, K. Bradnam, P. Canaran, J. Chan, C.-K. Chen, et al. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics Nucleic Acids Res., January 1, 2005; 33(suppl_1): D383 - D389. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Baross, Y. S.N. Butterfield, S. M. Coughlin, T. Zeng, M. Griffith, O. L. Griffith, A. S. Petrescu, D. E. Smailus, J. Khattra, H. L. McDonald, et al. Systematic Recovery and Analysis of Full-ORF Human cDNA Clones Genome Res., October 1, 2004; 14(10b): 2083 - 2092. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Nelson, G. Marnellos, S. Kammerer, C. R. Hoyal, M. M. Shi, C. R. Cantor, and A. Braun Large-Scale Validation of Single Nucleotide Polymorphisms in Gene Regions Genome Res., August 1, 2004; 14(8): 1664 - 1668. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Rotmistrovsky, W. Jang, and G. D. Schuler A web server for performing electronic PCR Nucleic Acids Res., July 1, 2004; 32(suppl_2): W108 - W112. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Potter, L. Clarke, V. Curwen, S. Keenan, E. Mongin, S. M.J. Searle, A. Stabenau, R. Storey, and M. Clamp The Ensembl Analysis Pipeline Genome Res., May 1, 2004; 14(5): 934 - 941. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. I. Jensen-Seaman, T. S. Furey, B. A. Payseur, Y. Lu, K. M. Roskin, C.-F. Chen, M. A. Thomas, D. Haussler, and H. J. Jacob Comparative Recombination Rates in the Rat, Mouse, and Human Genomes Genome Res., April 1, 2004; 14(4): 528 - 538. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Kwitek, J. Gullings-Handley, J. Yu, D. C. Carlos, K. Orlebeke, J. Nie, J. Eckert, A. Lemke, J. W. Andrae, S. Bromberg, et al. High-Density Rat Radiation Hybrid Maps Containing Over 24,000 SSLPs, Genes, and ESTs Provide a Direct Link to the Rat Genome Sequence Genome Res., April 1, 2004; 14(4): 750 - 757. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. Wilder, M.-T. Bihoreau, K. Argoud, T. K. Watanabe, M. Lathrop, and D. Gauguier Integration of the Rat Recombination and EST Maps in the Rat Genomic Sequence and Comparative Mapping Analysis With the Mouse Genome Genome Res., April 1, 2004; 14(4): 758 - 765. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Silander, K. L. Mohlke, L. J. Scott, E. C. Peck, P. Hollstein, A. D. Skol, A. U. Jackson, P. Deloukas, S. Hunt, G. Stavrides, et al. Genetic Variation Near the Hepatocyte Nuclear Factor-4{alpha} Gene Predicts Susceptibility to Type 2 Diabetes Diabetes, April 1, 2004; 53(4): 1141 - 1149. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Gunsalus, W.-C. Yueh, P. MacMenamin, and F. Piano RNAiDB and PhenoBlast: web tools for genome-wide phenotypic mapping projects Nucleic Acids Res., January 1, 2004; 32(90001): D406 - 410. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Safran, V. Chalifa-Caspi, O. Shmueli, T. Olender, M. Lapidot, N. Rosen, M. Shmoish, Y. Peter, G. Glusman, E. Feldmesser, et al. Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE Nucleic Acids Res., January 1, 2003; 31(1): 142 - 146. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. C. O'Donnell, L. J. Druhan, and B. R. Avalos Molecular characterization and expression analysis of leucine-rich {alpha}2-glycoprotein, a novel marker of granulocytic differentiation J. Leukoc. Biol., September 1, 2002; 72(3): 478 - 485. [Abstract] [Full Text] [PDF] |
||||
![]() |
S Eyre, P Roby, K Wolstencroft, K Spreckley, R Aspinwall, R Bayoumi, L Al-Gazali, R Ramesar, P Beighton, and G Wallis Identification of a locus for a form of spondyloepiphyseal dysplasia on chromosome 15q26.1: exclusion of aggrecan as a candidate gene J. Med. Genet., September 1, 2002; 39(9): 634 - 638. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Glusman, I. Yanai, I. Rubin, and D. Lancet The Complete Human Olfactory Subgenome Genome Res., May 1, 2001; 11(5): 685 - 702. [Abstract] [Full Text] |
||||
![]() |
International Molecular Genetic Study of Autism Co Further characterization of the autism susceptibility locus AUTS1 on chromosome 7q Hum. Mol. Genet., April 1, 2001; 10(9): 973 - 982. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Pruitt and D. R. Maglott RefSeq and LocusLink: NCBI gene-centered resources Nucleic Acids Res., January 1, 2001; 29(1): 137 - 140. [Abstract] [Full Text] |