|
|
|
Published online before print
March 20, 2002, 10.1101/gr.229202
Vol. 12, Issue 4, 656-664, April 2002
RESOURCES
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome.
| |
INTRODUCTION |
|---|
|
|
|---|
Some might wonder why in the year 2002 the world needs
another sequence alignment tool. The local alignment
problem between two short sequences was solved by the Smith-Waterman
algorithm in 1980 (Smith and Waterman 1981
). The FASTA
(Pearson and Lipman 1988
) and the BLAST family of
alignment programs including NCBI BLAST (Altschul et al.
1990
, 1997
), MegaBLAST (Zhang et al. 2000
), and
WU-BLAST (Altschul et al. 1990
; Gish and States 1993
;
States and Gish 1994
) provide flexible and fast alignments involving
large sequence databases, and are available free on many web sites.
Sim4 (Florea et al. 1998
) does a fine job of cDNA
alignment. The SAM program (Karplus et al. 1998
) and
PSI-BLAST (Altschul et al. 1997
) slowly but surely find
remote homologs. Gotoh's many algorithms robustly deal with gaps
(Gotoh 1990
, 2000
). SSAHA (Ning et al. 2001
) maps sequence
reads to the genome with blazing efficiency.
In the process of assembling and annotating the human genome, I was faced with two very large-scale alignment problems: aligning three million ESTs and aligning 13 million mouse whole-genome random reads against the human genome. These alignments needed to be done in less than two weeks' time on a moderate-sized (90 CPU) Linux cluster in order to have time to process an updated genome every month or two. To achieve this I developed a very-high-speed mRNA/DNA and translated protein alignment algorithm.
The new algorithm is called BLAT, which is short for
"BLAST-like alignment tool." BLAT is
similar in many ways to BLAST. The program rapidly scans
for relatively short matches (hits), and extends these into
high-scoring pairs (HSPs). However, BLAT differs from
BLAST in some significant ways. Where BLAST
builds an index of the query sequence and then scans linearly through
the database, BLAT builds an index of the database and
then scans linearly through the query sequence. Where
BLAST triggers an extension when one or two hits occur in
proximity to each other, BLAT can trigger extensions on
any number of perfect or near-perfect hits. Where BLAST
returns each area of homology between two sequences as separate
alignments, BLAT stitches them together into a larger
alignment. BLAT has special code to handle introns in
RNA/DNA alignments. Therefore, whereas BLAST delivers a
list of exons sorted by exon size, with alignments extending slightly
beyond the edge of each exon, BLAT effectively
"unsplices" mRNA onto the genome
giving a single alignment that
uses each base of the mRNA only once, and which correctly positions
splice sites.
BLAT is available in several forms. Since building an index of the whole genome is a relatively slow procedure, a BLAT server is available which builds the index and keeps it in memory. A BLAT client can then query the index through the server. The client/server version is especially suitable for interactive applications, and is available via a web interface at http://genome.ucsc.edu. A stand-alone BLAT is also available, which is more suitable for batch runs on one or more CPUs. Both the client/server and the stand-alone can do comparisons at the nucleotide, protein, or translated nucleotide level.
| |
RESULTS |
|---|
|
|
|---|
BLAT is currently used in three major applications in conjunction with http://genome.ucsc.edu. BLAT is used to produce the human EST and mRNA alignments. The human EST alignments compared 1.75 × 109 bases in 3.73 × 106 ESTs against 2.88 × 109 bases of human DNA and took 220 CPU hours on a Linux farm of 800 MhZ Pentium IIIs. BLAT was used in translated mode to align a 2.5× coverage unassembled whole-genome shotgun of the mouse versus the masked human genome. This involved 7.51 × 109 bases in 1.33 × 107 reads and took 16,300 CPU hours. The client/server version of BLAT is used to power untranslated and translated interactive searches on http://genome.ucsc.edu. Researchers all over the world use BLAT to perform thousands of interactive sequence searches per day. The nucleotide server has sustained over 500,000 search requests per day from program-driven queries. We do ask those researchers who are doing more than a few thousand program-driven queries to obtain a copy of BLAT to use on their own servers. The nucleotide server is not as efficient as the stand-alone program, since to save memory it does not keep the genome in memory, only the index. The index uses approximately 1 gigabyte on unmasked DNA in untranslated mode, and approximately 2.5 gigabytes on masked DNA in translated mode. The translated mode server by default is less sensitive than the default stand-alone settings. It requires three perfect amino acid 4-mers to trigger an alignment. The untranslated server usually responds to a 1000-base cDNA query in less than a second. The translated server usually responds to a 400-amino acid protein query in <5 sec.
Evaluating mRNA/DNA Alignments
As a test of BLAT, I remapped 713 mRNAs corresponding
to genes that the Sanger Centre has annotated on chromosome 22 (Dunham
et al. 1999
) back to chromosome 22 with BLAT and with
Sim4 (Florea 1998
). When BLAT produced multiple alignments for an mRNA, only the highest scoring alignment was
kept. In 99.99% of the annotated bases, the BLAT alignment agreed with the Sanger annotations. There were 107 bases in
10 genes where there was disagreement. In five of the 10 genes, the
disagreement was only in the placement of nonstandard splice sites. In
two cases, BLAT did not find small (<32-base) initial
exons. In one case, an exon of six bases was present and aligning
fully, but in a different place than annotated (where it also aligned
fully, but with better flanking splice sites). In one case,
BLAT positioned an intron to conform with the consensus
sequence on the wrong strand. That is, the gap corresponding to the
intron was positioned to have CT/AC rather than GT/AG ends. The final
case was a 38-base sequence that BLAT was unable to place
because the middle contained some degenerate sequence. The
BLAT alignments were done at the default settings and took
26 sec.
The Sim4 alignments of the same data took 17,468 sec (almost 5 h). They agreed with the Sanger annotations in 99.66% of the bases. There were disagreements between the Sim4 alignments and the Sanger annotations from various causes in 52 of the genes. Most of these disagreements were small.
Evaluating Mouse/Human Translated Alignments
Though the translated modes of BLAT are relatively new,
they are quick and effective. The translated mode of BLAT was inspired by the Exofish research at Genoscope (Roest Crollius et
al. 2000
). Exofish showed that a TBLASTX run using an
identity matrix (where matches were weighted +15 and mismatches
12
for all amino acids) and a word size of 5 was quite effective in
aligning coding regions conserved between Homo sapiens and Tetraodon nigroviridis. For human and mouse it has been shown that gapless alignments are in many ways preferable to gapped alignments for detecting coding regions (Wiehe et al. 2001
). Table 1 shows the timings of
BLAT and WU-TBLASTX run on a modest-sized
data set at gapless Exofish-like settings. BLAT runs much
faster, making it feasible to compare vertebrate genomes quickly enough
to keep up with the vast output of today's sequencing centers.
|
Pankaj Agarwal provided a WU-TBLASTX alignment of 13 million mouse genomic reads versus human chromosome 22 run under a
gapless setting that should theoretically be somewhat more sensitive than the matrix used for the Exofish settings because of the use of the
BLOSUM62 matrix (P. Agarwal, pers. comm.). Table
2 shows a comparison between this alignment
and a translated BLAT alignment done at the indicated
setting. The results were quite comparable in sensitivity.
|
Other Usage Information
BLAT can also be used in translated mode to align proteins or mRNA from one species against genomic DNA of another species. In translated mRNA/translated DNA mode, BLAT has to align only one strand of the query sequence, speeding it up by a factor of two. In this mode it also becomes more tolerant of intron-induced gaps. BLAT can do protein-protein alignments as well, but it is not likely to be the tool of choice for these. The protein databases are still small enough that BLASTP can handle them easily, and BLASTP is more sensitive than BLAT.
BLAT can handle very long database sequences efficiently. It is more efficient at short query sequences than long query sequences. It is not recommended for query sequences longer than 200,000 bases. It is not necessary to mask the DNA for untranslated BLAT searches. Translated searches generally produce much quicker, cleaner results if the sequence is masked for repeats and low complexity sequence.
| |
METHODS |
|---|
|
|
|---|
Algorithm
All fast alignment programs that I am aware of break the alignment problem into two parts. Initially in a "search stage," the program detects regions of the two sequences which are likely to be homologous. The program then in an "alignment stage" examines these regions in more detail and produces alignments for the regions which are indeed homologous according to some criteria. The goal of the search stage is to detect the vast majority of homologous regions while reducing the amount of sequence that is passed to the alignment stage.
Searching With Single Perfect Matches
A simple and reasonably effective search stage is to look for subsequences of a certain size, k, which are shared by the query sequence and the database. In many practical implementations of this search, every K-mer in the query is compared against all nonoverlapping K-mers in the database. Let's examine the number of homologous regions that are missed, and the number of nonhomologous regions that are passed to the alignment stage using these criteria. First, we'll need some definitions: K: The K-mer size. Typically this is 8-16 for nucleotide comparisons and 3-7 for amino acid comparisons. M: The match ratio between homologous areas. This would be typically about 98% for cDNA/genomic alignments within the same species, about 89% for protein alignments between human and mouse. H: The size of a homologous area. For a human exon this is typically 50-200 bases. G: The size of the database
3 billion bases for the human genome.
Q: The size of the query sequence.
A: The alphabet size; 20 for amino acids, 4 for nucleotides.
Assuming that each letter is independent of the previous letter, the
probability that a specific K-mer in a homologous region of the
database matches perfectly the corresponding K-mer in the query is simply:
|
(1) |
|
(2) |
|
(3) |
|
(4) |
|
|
Searching With Single Almost Perfect Matches
What if instead of requiring perfect matches with a K-mer to trigger an alignment, we allow almost perfect matches, that is, hits where one letter may mismatch? The probability that a nonoverlapping K-mer in a homologous region of the database matches almost perfectly the corresponding K-mer in the query is:
|
(5) |
|
(6) |
|
(7) |
|
|
Searching With Multiple Perfect Matches
Another alternative search method is to require multiple perfect matches that are constrained to be near each other. Consider a situation where the K size is 10 and there are two hits
one starting
at position 10 in the query and 1010 in the database, and another
starting at position 30 in the query and 1030 in the database. These
two hits could easily be part of a region of homology extending from
positions 10-39 in the query and 1010-1039 in the database. If we
subtract the query coordinate from the database coordinate, we get a
"diagonal" coordinate. Consider the search criteria that there must
be N perfect matches, each no further than W letters from each other in
the target coordinate, and have the same diagonal coordinate (Fig.
1). For N = 1, the probability that a
nonoverlapping K-mer in a homologous region of the database matches
perfectly the corresponding K-mer in the query is simply as before:
|
(8) |
|
(9) |
|
(10) |
|
(11) |
|
(12) |
1)th match, which
gives the more general relationship
|
(13) |
|
(14) |
|
|
|
Selecting Initial Match Criteria
Both single imperfect matches and multiple perfect matches have a significant advantage over single perfect matches. They drastically reduce the number of alignments which must be checked to achieve a given level of sensitivity, as shown in Tables 9 and 10. The multiple-perfect match criteria can be modified to allow small insertions and deletions within the homologous area by allowing matches to be clumped if they are near each other rather than identical on the diagonal coordinate. This improves real-world sensitivity at the expense of increasing the number of alignments that must be done. Allowing a single insertion or deletion increases the alignments by a factor of three, whereas allowing two increases the alignments by a factor of five. In general, two perfect matches with the appropriate K size give specificity for a given level of sensitivity similar to that given by three or more perfect matches. The near-perfect match criterion overall is similar to the two perfect match criteria. The near-perfect criterion cannot accommodate insertions or deletions, but it has superior performance on finding small regions of homology (Table 11). For finding coding exons in mouse/human alignments, whichever strategy is used, greater specificity is seen at the amino acid rather than the nucleotide level.
|
|
|
|
Clumping Hits and Identifying Homologous Regions
To implement the match criteria, BLAT builds up an index of nonoverlapping K-mers and their positions in the database. BLAT excludes K-mers that occur too often from the index, as well as K-mers containing ambiguity codes and optionally K-mers that are in lowercase rather than uppercase. BLAT then looks up each overlapping K-mer of the query sequence in the index. In this way, BLAT builds a list of "hits" where the query and the target match. Each hit contains a database position and a query position. The following algorithm is used to efficiently clump together multiple hits. The hit list is split into buckets of 64k each, based on the database position. Each bucket is sorted on the diagonal (database minus query positions). Hits that are within the gap limit are bundled together into proto-clumps. Hits within proto-clumps are then sorted along the database coordinate and put into real clumps if they are within the window limit on the database coordinate. To avoid missed clumps near the 64k bucket boundary, unclumped hits and clumps that are within the window limit are tossed into the next bucket for additional clumping opportunities. The sorting algorithm mSort, which is related to qSort, is used. The bucketing tends to keep N relatively small. Clumps with less than the minimum number of hits are discarded, and the rest are used to define regions of the database which are homologous to the query sequence. Clumps which are within 300 bases or 100 amino acids in the database are merged together. Five hundred additional bases are added on each side to form the final homologous region.Searching for Near Perfect Matches
BLAT has an option to allow one mismatch in a hit. This is implemented by scanning the index repeatedly for each K-mer in the query. Every possible K-mer that matches in all but one position, as well as the K-mer that matches at every position, is looked up. In all, K*(A
1) + 1 lookups are required. For an amino-acid search
with K = 8, this amounts to 153 lookups. Because a straight index of
8-mers would require 208 index positions or about 100 billion
bytes, it is necessary to switch to a hashing scheme rather than an
indexing scheme, further cutting efficiency. As a consequence, for a
given level of sensitivity, the near-perfect match criterion runs 15×
more slowly than the multiple-perfect match criterion in
BLAT (Table 13). The
near-perfect match criterion seems best suited for programs that hash
the query sequence rather than the database. A query sequence is
sufficiently small that each possible nearly matching K-mer could be
hashed, and therefore the index would not have to be scanned repeatedly.
|
Alignment Stage
The alignment stage performs a detailed alignment between the query sequence and the homologous regions. For historical reasons, the alignment stage for nucleotide and protein alignments is quite different. Both have limitations, and are good candidates for future BLAT upgrades. On the other hand, both are quite useful in their present form for sequences which are not too divergent.
Nucleotide Alignments
The nucleotide alignment stage is based on a cDNA alignment program first used in the Intronerator (http://www.cse.ucsc.edu/~kent/intronerator) (Kent and Zahler 2000Protein Alignments
The protein alignment strategy is simpler. The hits from the search stage are kept and extended into maximally scoring ungapped alignments (HSPs) using a score function where a match is worth 2 and a mismatch costs 1. A graph is built with HSPs as nodes. If HSP A starts before HSP B in both query and database coordinates, an edge is placed from A to B. The edge is weighted by the score of B minus a gap penalty based on the distance between A and B. In the case where A and B overlap, a "crossover" point is selected which maximizes the sum of the scores of A up to the crossover and B starting at the crossover, and the difference between the full scores and the scores just up to the crossover is subtracted from the edge score. A dynamic program then extracts the maximal-scoring alignment by traversing this graph. The HSPs in the maximal-scoring alignment are removed, and if any HSPs are left the dynamic program is run again. The major limitation of this protein alignment strategy is that if there is an indel, part of the alignment will be lost unless the search stage manages to find both sides of the indel. For the translated mouse versus translated human genome job, which was the major motivation for protein BLAT, this limitation is not as serious as it would be when searching for more distant homologs. Indeed in the translated mouse/translated human case, this limit on indels is actually useful in some ways as it reduces the amount of pseudogenes which are found by BLAT more than it reduces the amount of genes found. Even so, in the future we hope to replace this simplistic extension phase with a banded (only small gaps allowed) Smith-Waterman algorithm (Chao et al. 1992Stitching and Filling In
It is often the case that the alignment of a gene is scattered across multiple homologous regions found in the search phase. These alignments are stitched together using a minor variation of the algorithm used to stitch together protein HSPs. For DNA alignments at this stage, the gap penalty is equal to a constant plus the log of the size of the gap. For mRNA/genomic alignments, if after stitching there are gaps left between aligning blocks in both the database and query sequence, the nucleotide alignment algorithm is called on the gap to attempt to fill it in. This gives BLAT a chance to find small internal exons that are further away than 500 bases from other exons, and which are too small to be found by the search stage.
Since the sort time is O(N logN), that is, proportional to N times log N, where N is the number of hits to be sorted, and the dynamic program time is O(N2) where N is the number of HSPs, an additional step is necessary to make BLAT efficient on longer query sequences. Untranslated nucleotide queries longer than 5000 bases and translated queries longer than 1500 bases are broken into subqueries that have approximately 250 bases of overlap. Each subquery is aligned as above, and the resulting alignments are stitched together. Currently this subdividing and stitching is only available for the stand-alone BLAT, not the client/server version.
| |
DISCUSSION |
|---|
|
|
|---|
As shown above, BLAT is a very effective tool for doing
nucleotide alignments between mRNA and genomic DNA taken from the same
species. It is more accurate and orders of magnitude faster than
Sim4. Sim4 in turn is more accurate and
orders of magnitude faster than other published tools such as
est_genome (Mott 1997
; Florea et al. 1998
). Although the
alignment strategy BLAT uses for nucleotide alignments becomes less effective below 90% sequence identity, it efficiently "unsplices" mRNA, and accommodates the level of sequence divergence introduced by sequencing error. BLAT is able to unsplice all the human mRNA in GenBank, including the ESTs, in less than a day
on a 100-CPU computer cluster. Since the human, mouse, and other large
genome projects are updating sequences at a rapid rate, and GenBank
continues to grow at a rapid rate, rapid alignment is needed to keep
genome annotations in synchrony with improving genome assemblies.
BLAT working in translated mode is capable of rapidly aligning data across vertebrate species without significant compromise. While TBLASTX can be configured to be more sensitive than BLAT, at settings commonly used for mammal-mammal comparisons, BLAT runs approximately 50 times faster. Even using BLAT, an alignment of public mouse whole-genome shotgun data took 12 days on our 100-CPU cluster. It would be difficult to keep the mouse-human homology information up to date with a slower tool.
High-speed alignment programs have two major stages
a search stage
that uses a heuristic to identify regions likely to be homologous, and
an alignment stage that does detailed alignments of the previously
defined homologous regions. To get adequate speed when operating at the
scale of whole genomes, the search stage is crucial. An index of some
sort is key to an efficient search stage. BLAT indexes the
database rather than the query sequence. This more than anything is
responsible for the relatively high speed of BLAT compared
to Sim4 or TBLASTX. Rather than having to
linearly scan through a database of gigabases of sequence looking for
index matches, BLAT only has to scan through a relatively
short query sequence. The program SSAHA indexes the
database in a manner very similar to that of BLAT, and is
an extremely effective tool for aligning genomic regions from the same
organism against each other. Currently SSAHA does not
implement unsplicing logic, and always uses a single perfect match as a seed.
The challenge to indexing the database is twofold: the size of the index and the time it takes to generate the index. Fortunately, computers with several gigabytes of RAM have become affordable and commonplace in the last few years, so the size of the index is not the problem it once was. By making the index relatively efficient (only four bytes per index entry vs. eight in the published version of SSAHA), and only indexing nonoverlapping words, BLAT is able to index the human genome at the nucleotide level in 0.9 gigabytes and to index a RepeatMasker masked, translated human genome in 2.5 gigabytes. The index does take some time to generate: 30 min for a translated index of the human genome. Fortunately the index is not generated often. In batch mode, BLAT generates the index once at the start of processing a batch of typically hundreds of thousands of query sequences. In interactive client/server mode, the genome only has to be generated once for each genome assembly. The typical user simply pastes in a sequence to a web form, and quickly receives an alignment in response.
How an index is used is also important to the speed of an alignment algorithm. Triggering an alignment for each match to the index is not always the optimal strategy. For a sensitive search, it is desirable to index relatively small K-mers. However, small K-mers will return many false positives, potentially creating a bottleneck in the alignment stage if the alignment stage is computationally expensive. Requiring multiple nearby matches or using longer K-mers but tolerating a mismatch as search criteria both have much greater specificity for a given level of sensitivity than the criterion of a single perfect match. BLAT implements a very quick algorithm for finding multiple nearby perfect matches, which allows the search stage to be specific enough that the genome itself can be kept on disk and only the index kept in RAM in memory in the client/server mode.
The BLAT software in source and executable form is available without charge for nonprofit, academic, and personal uses. Nonexclusive commercial licenses are available from the author. The software can be downloaded from the source and executable links at http://www.soe.ucsc.edu/~kent.
| |
ACKNOWLEDGMENTS |
|---|
My warmest thanks to the Gene Cats group, Alan Zahler at the University of California Santa Cruz, and to Heidi, Mira, and Tisa at home for their encouragement, advice, and entertainment during this work. Thanks to the International Human Genome Project, the Mouse Sequencing Consortium, the Mammalian Gene Collection, and other genome and mRNA sequence projects for generating so much sequence data. Thanks to HHMI and NHGRI for funding the UCSC computer clusters where the software could be applied at a full-genome scale. Thanks to Tom Pringle, Heidi Brumbaugh, Webb Miller, Alan Zahler, and David Haussler for a thorough reading of the manuscript and helpful comments.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| |
FOOTNOTES |
|---|
E-MAIL kent{at}biology.ucsc.edu
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.229202. Article published online before March 2002.
| |
REFERENCES |
|---|
|
|
|---|
Received December 19, 2001; accepted in revised form January 25, 2002.
This article has been cited by other articles:
![]() |
M. M. Davis, D. A. Primrose, and R. B. Hodgetts A Member of the p38 Mitogen-Activated Protein Kinase Family Is Responsible for Transcriptional Induction of Dopa decarboxylase in the Epidermis of Drosophila melanogaster during the Innate Immune Response Mol. Cell. Biol., August 1, 2008; 28(15): 4883 - 4895. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Goni, C. Fenollosa, A. Perez, D. Torrents, and M. Orozco DNAlive: a tool for the physical analysis of DNA at the genomic scale Bioinformatics, August 1, 2008; 24(15): 1731 - 1732. [Abstract] [PDF] |
||||
![]() |
A. R. Grosso, A. Q. Gomes, N. L. Barbosa-Morais, S. Caldeira, N. P. Thorne, G. Grech, M. von Lindern, and M. Carmo-Fonseca Tissue-specific splicing factor gene expression signatures Nucleic Acids Res., July 24, 2008; (2008) gkn463v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-O. Sauvain, A. P. Dorr, B. Stevenson, A. Quazzola, F. Naef, M. Wiznerowicz, F. Schutz, V. Jongeneel, D. Duboule, F. Spitz, et al. Genotypic Features of Lentivirus Transgenic Mice J. Virol., July 15, 2008; 82(14): 7111 - 7119. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Manning, S. L. Young, W. T. Miller, and Y. Zhai From the Cover: The protist, Monosiga brevicollis, has a tyrosine kinase signaling network more elaborate and diverse than found in any known metazoan PNAS, July 15, 2008; 105(28): 9674 - 9679. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Z. Holland, R. Albalat, K. Azumi, E. Benito-Gutierrez, M. J. Blow, M. Bronner-Fraser, F. Brunet, T. Butts, S. Candiani, L. J. Dishaw, et al. The amphioxus genome illuminates vertebrate origins and cephalochordate biology Genome Res., July 1, 2008; 18(7): 1100 - 1111. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lee, E. Cheran, and M. Brudno A robust framework for detecting structural variations in a genome Bioinformatics, July 1, 2008; 24(13): i59 - i67. [Abstract] [PDF] |
||||
![]() |
B. Waegele, T. Schmidt, H. W. Mewes, and A. Ruepp OREST: the online resource for EST analysis Nucleic Acids Res., July 1, 2008; 36(suppl_2): W140 - W144. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zeng, X.-M. Xia, and C. J. Lingle Species-specific Differences among KCNMB3 BK {beta}3 Auxiliary Subunits: Some {beta}3 N-terminal Variants May Be Primate-specific Subunits J. Gen. Physiol., June 30, 2008; 132(1): 115 - 129. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Cappelletti, M. Gariboldi, L. De Cecco, S. Toffanin, J. F Reid, L. Lusa, E. Bajetta, L. Celio, M. Greco, A. Fabbri, et al. Patterns and changes in gene expression following neo-adjuvant anti-estrogen treatment in estrogen receptor-positive breast cancer Endocr. Relat. Cancer, June 1, 2008; 15(2): 439 - 449. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kim, B. Patel, K. E. Schroeder, A. Raza, and J. Dejong Organization and transcriptional output of a novel mRNA-like piRNA gene (mpiR) located on mouse chromosome 10 RNA, June 1, 2008; 14(6): 1005 - 1011. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Okada, C. Tashiro, K. Numata, K. Watanabe, H. Nakaoka, N. Yamamoto, K. Okubo, R. Ikeda, R. Saito, A. Kanai, et al. Comparative expression analysis uncovers novel features of endogenous antisense transcription Hum. Mol. Genet., June 1, 2008; 17(11): 1631 - 1640. [Abstract] [Full Text] [PDF] |
||||
![]() |
|