|
|
|
|
Vol. 10, Issue 4, 431-445, April 2000 Systematic Management and Analysis of Yeast Gene Expression DataDepartment of Genetics, Harvard Medical School, Boston, Massachusetts 02115 USA, and the Lipper Center for Computational Functional Genomics, Boston, Massachusetts 02115 USA
We report steps toward the systematic management, standardization, and analysis of functional genomics data. We developed the ExpressDB database for yeast RNA expression data and loaded it with ~17.5 million pieces of data reported by 11 studies with three different kinds of high-throughput RNA assays. A web-based tool supports queries across the data from these studies. We examined comparability of data by converting data from 9 studies (217 conditions) into mRNA relative abundance estimates (ERAs) and by clustering of conditions by ERAs. We report on generation of ERAs and condition clustering for non-microarray data (5 studies, 63 conditions) and describe initial attempts to generate microarray-based ERAs (4 studies, 154 conditions), which exhibit increased error, on our web site http://arep.med.harvard.edu/ExpressDB. We recommend standards for data reporting, suggest research into improving comparability of microarray data through quantifying and standardizing control condition RNA populations, and also suggest research into the calibration of different RNA assays. We introduce a model for a database that integrates different kinds of functional genomics data, Biomolecule Interaction, Growth and Expression Database (BIGED).
Ever-growing amounts of sequence data for numerous
organisms, combined with readily available technology for large-scale
expression studies on the basis of oligonucleotide arrays, DNA
microarrays, Serial Analysis of Gene Expression (SAGE), and other
techniques (Velculescu et al. 1995
As these three components are dependent on each other and must
coevolve, the right mixture will only come together with experience. We
hope to jump start the process by describing here the working prototypes of parts of the required systems and examples of what can be
done with them. Specifically, we describe ExpressDB, a general
database for RNA expression data that has been loaded with data from 11 yeast studies using three different kinds of high-throughput RNA level
assays (see Table 1). We also describe EXD, an integrated web-based
application that supports user queries of ExpressDB data. ExpressDB and
EXD differ from existing research-specific databases (see web sites on
Table 1) in that they represent and manage data from multiple studies,
and complement databases such as ArrayDB (Ermolaeva et al. 1998 In each study whose data were loaded into ExpressDB, data were collected and prepared in ways appropriate to the study's particular experimental design. Because designs and methods are not generally coordinated across studies, data from different studies are not always easily compared. This has no impact on the success of each study individually, but data comparability assumes increased importance in a database context in which comparability improvements can translate into simpler and more meaningful queries, more efficient database structures, and opportunities for more effective data mining. To gauge comparability of currently available data, we therefore formulated what we considered to be an attainable ideal and assessed what would be required for the data on ExpressDB to meet it. We propose specific recommendations on the basis of this assessment (see Discussion). We defined our ideal state of data comparability to be: (1) All RNA expression data are provided in the form of estimated relative abundances (ERAs) of a defined set of functionally distinguishable RNA fragments (FDRs) which, in the present case, we take to be RNAs corresponding to ORFs. An ORF ERA represents the fractional abundance of the ORF's RNA with respect to the total population of ORF RNAs in cells in a particular experimental condition (defined by cell strain and environmental history). (2) Analytical tools used to measure data comparability confirm that similar conditions have similar RNA expression profiles, regardless of which RNA assays are used for data collection. The rationale for (1) is that ERAs are intuitive and unambiguous measures of RNA level that are theoretically directly comparable across conditions regardless of experimental methodologies. In assessing ExpressDB data against this ideal, we converted as much
data on ExpressDB as possible to the form of ERAs, and explored
clustering of conditions by ORF expression profiles as a tool for
analytically investigating data comparability. We undertook these steps
for data generated from oligonucleotide arrays (4 studies, 60 conditions) SAGE (1 study, 3 conditions), and microarrays (4 studies,
154 conditions), generating a set of ORF ERAs for 217 conditions.
Issues with microarray data currently make it difficult to compute ERAs
from them and our best effort resulted in microarray ERA values that
exhibit increased variability compared with corresponding ratios
(coefficient of variation of ERAs = 3.3 times that of ratios) (see
Results). We focus here on methods and results for Affymetrix and
SAGE data. Readers interested in our microarray results may consult the
supplemental material on our web site
http://arep.med.harvard.edu/ExpressDB. A file containing ERAs for 213 conditions
Database ExpressDB is a relational database for RNA expression data. We implemented it using Sybase SQL Server 11.0.3 on a shared DEC 3000 server running DEC Unix 4.0D. We conceive of ExpressDB as a generalized two-dimensional table that can subsume individual tables of expression data reported by researchers. We provide a high-level logical data model for the ExpressDB database and an example of how it operates as a generalized two-dimensional table in Figure 1. That figure also presents names of ExpressDB tables that will be used throughout this article. Note that names of these tables are always capitalized (e.g., Measure).
We developed a utility program EDBUpdate to load data from individual
tab-delimited files (load files) into ExpressDB. Load files must
present a systematically collected set of measurements or descriptive
information for a series of ORFs, in which each line of the file
presents information for an ORF and each column a particular
measurement or information field. Examples of measurements are
numerical values representing ORF mRNA abundance and data-quality indicators. An ORF description field would be an example of an information field. Measurement columns are represented by ExpressDB Measure records. Using EDBUpdate, we performed loads of all available data files associated with the studies described in Table 1 into the
database as well as data from two others (Eisen et al. 1998
Load files downloaded from public sources often required minor editing to put them into the proper format for loading. For instance, files presenting data collected with Affymetrix oligonucleotide arrays often give the ORF and common gene name in the same column separated by a (/), and we had to separate these into distinct columns. More extensive work was required to load the SAGE-based expression data from Vel because data from SAGE is in the form of counts of tag sequences in cDNAs, whereas ExpressDB imposes as a structural requirement that data be indexed by ORF. A key issue for this indexing is that some SAGE tag sequences cannot be assigned to a unique ORF. As a result, for each SAGE condition, we computed and loaded into ExpressDB both a minimum and a maximum tag count for each ORF, in which the minimum count includes only counts of tags uniquely assignable to the ORF (which we call unambiguous tags) and the maximum count includes these plus the counts of tags shared with other ORFs (which we call ambiguous tags). Additional details may be found on the ExpressDB database in the Expression Data Set record for the Vel experiments. Database Query Application Our web-based query interface for ExpressDB, the EXD system, can be
accessed at http://arep.med.harvard.edu/ExpressDB/. A JavaScript
2.1-supporting web browser such as Microsoft Internet Explorer 4.0+ or
Netscape Navigator 4.0+ is required. The logical flow of the EXD system
is depicted in Figure 2a. The main line of this logic
is that the user is prompted for successively more detailed
specifications concerning the query, starting with the Expression Data
Sets (see Fig. 1) of interest, moving on to the Measures of interest
within these Expression Data Sets, and finally to conditions that must
be satisfied by the ORFs or their Measure values. ExpressDB allows
Expression Data Set and Measure records to be marked private and these
are not offered for user selection by EXD; this option has been used
for the Coh set of experiments that are not yet published. The query
conditions offered for user specification are sensitive to the data
format of the Measures; thus the user is prompted for text matching
specifications when a Measure has a character format, and with
numerical equalities and inequalities for numerical formats.
Statistical specifications may also be indicated for numerical
measures, for example, it is possible to ask for all ORFs for which the
value of a measurement is greater than two standard deviations from the
mean. It is also possible to ask for only those ORFs that are either in
or not in a group of ORFs defined in the ExpressDB ORF Group table. To demonstrate this capability, we loaded this table with 207 functional groupings of yeast ORFs defined on the Munich Information Center for
Protein Sciences (MIPS) database (Mewes et al. 1999
Figure 2b provides an impression of what it is like to use EXD to
perform a typical ExpressDB query. On entry to the system, the user is
presented with a form listing data sets available on the database (Fig.
2b, step 1). The user selects one or more data sets; here the Der_diaux
and Der_tup data sets have been selected (see Table 1). On clicking the
Submit button, the user is brought to the next form (Fig. 2b, step 2)
which presents information fields and Measures available on the
database for the selected data sets, and the user chooses the ones he
or she wishes to see. Here, the user has asked to see the information
field SGDID (Saccharomyces Genome Database identifier) (Cherry
et al. 1999 We believe the EXD system to be the first query system that allows users to query simultaneously any of the expression data reported by experiments associated with different literature references and return the results collated by ORF name. Fundamentally, this derives from the fact that all of the data has been collected in one database, but it is also supported by EXD's ability to navigate ExpressDB's generalized two-dimensional table structure. At this time, however, we recommend use of EXD only for relatively simple queries involving ~10 or fewer Measures over all or a group of ORFs, partly because of performance issues with more complex queries and the database's shared computer environment, and partly because we need to develop an interface that makes it easier for users to find data items of interest from a set of >2000 available Measures and then specify query conditions for them. Generation of ERAs Generation of ERAs is straightforward for data derived from
Affymetrix oligonucleotide arrays and SAGE (see Methods), but microarray-derived data present a significant issue. Microarray-based experiments simultaneously collect intensity levels of fluorescently labeled cDNAs derived from an experimental condition, and intensity levels of cDNAs, labeled with a different fluorophore, derived from a
control condition. The two cDNA preparations are hybridized in parallel
to the same probe sequence spots on the array (DeRisi et al. 1997 Other issues that complicate both generation of ERAs and comparisons of data generally include (1) the frequent reporting (see Table 3) of multiple measurement values for an ORF from single experimental or control conditions derived from multiple spots for an ORF on a microarray or multiple probe sets for an ORF on an Affymetrix array, which raises the question of how values should be combined or selected for further analysis, and (2) use of different ORF names across different sets of experiments, making matching of ORF data across experiments difficult. In Affymetrix-based experiments, multiple values for an ORF arise from
distinct probe sets for the ORF that are distinguished by their probes
being located to different exons or other general probe set
characteristics. We found different types of probe sets to have
different properties. For instance, we computed an aggregated measure
(see Methods) of the ratio of average PM-MM values of exon 1 probe
sets against those of exon 2 probe sets from values given in the Hol
and Cho sets of experiments, and found that in both cases, exon 1 probe
set values were, in aggregate, ~0.6 of exon 2 probe set values (Hol:
N = 90 ratios = 96 ratios-6 outliers; Cho:
N = 93 ratios = 100 ratios-7 outliers;
S.D. = 0.4 for each distribution). When presented with a
choice of expression measurements for the same ORF with different
values, one would ideally like to identify and use the measure of
highest quality, but the fact that exon 1 probe sets have the property
of yielding smaller measurement values than exon 2 probe sets does not
imply that exon 1 probe sets are of less quality. Our strategy for
consolidating multiple Affymetrix probe set values therefore focused on
consistency. Because most probes for ORFs with single probe sets are
taken from the 3' ends of ORF sequences, we decided to handle ORFs
with probe sets for multiple exons by using the exon 2 values instead of exon 1 values. We also avoided probe sets with special feature set
indicators where possible (see Methods). Affymetrix GeneChip software
returns a "presence call" that describes when a gene product may be
considered to be present, marginally present, or absent in an RNA
sample (Lockhart et al. 1996 In the case of SAGE, multiple measurements for an ORF arise from counts
for distinct ORF SAGE tags. The sum of counts for unambiguous tags for
an ORF, maintained as minimum tag counts on ExpressDB, can be safely
attributed to RNA expression by that ORF, but counts for ambiguous tags
included in maximum tag counts cannot be safely attributed to that ORF,
as they may have come from the RNAs of different ORFs that happen to
share the same tag (Velculescu et al. 1997 In the end, we produced a file containing ERAs for all ORFs for which there were usable data for 217 conditions (60 Affymetrix, 3 SAGE, and 154 microarray). The number of ORFs (identified by name) for which data are provided in at least one condition is 6293. The process of generating ERAs included steps to resolve different names for the same ORF (see Methods), and of these 6293 all but 94 could be identified with SGDIDs . Wherever data for an ORF was not reported in a condition, or an ERA could not be computed for an ORF, a "null" value (empty field) is included in the table for that ORF and condition. Because the maximum number of ORFs for which ERAs are reported in a condition is 6221, all conditions contain some null values; some conditions, like the SAGE conditions noted above, contain large numbers of null values. As noted above, the version of the file that may be downloaded from our web site contains only 213 conditions (56 Affymetrix) because four conditions (Coh) have not been published previously. Clustering of Experimental Conditions Although clustering of ORFs on the basis of expression levels over
sets of conditions has often been reported (Cho et al. 1998 When clustering ERA data, we should generally expect that conditions will tend to segregate into clusters according to related series of experiments for two reasons: First, conditions in related series frequently use the same or similar strains and cell environments. Second, differences in technique and equipment used in different studies may have the effect of weighting individual ORF abundances from different series differently. Both of these factors will tend to make conditions in a related series more similar in ORF ERA profile than conditions from different series. A diagram depicting the highest level 14 clusters of 217 conditions grouped by similarity of pairwise correlation coefficients over transformed ORF estimated relative abundances (see Methods) is shown in Figure 3. It is evident that conditions cluster mainly with other conditions in the same related sets of experiments. To confirm that this and other observations below are not simply artifacts of the clustering algorithm, we also performed clustering by an alternative method, the clustering of conditions directly by transformed ORF ERAs rather than pairwise correlation coefficients of conditions over their ERAs (see Fig. 4 in the supplemental materials on our web site). In both exercises, we clustered subsets of high-expressing ORFs that showed evidence of induction across conditions, rather than clustering over all ORFs, to reduce noise that might be introduced from large numbers of low-expressing ORFs (see Methods). Despite some shuffling of clusters at the highest levels, it remains true that conditions in the same related sets of experiments are found to be closer to each other than to conditions in other sets. Details may be found on our web site.
To assess the ability of condition clustering to capture
similarities and differences between experiments, we examined the Hol
set of 42 conditions. This set comprises 21 experiments with RNA
polymerase complex mutants and 21 corresponding wild-type controls.
Within this set of experiments, (1) the 21 experiments contain 10 pairs
of replicated experiments, and likewise the 21 controls contain 10 pairs of replicates, making a total of 20 pairs of replicated
conditions. Nine of these 20 pairs of replicated conditions are
clustered at the leaf level in Figure 3
(P =
The database and query tool described here represent preliminary versions of tools required in an integrated tool kit for exploring expression data. They can be modified to make them more sophisticated and complete. Some improvements involve relatively simple technical fixes. The current version of ExpressDB is yeast specific, but the design changes required to generalize it are small and an organism-general version will soon be available. The key changes allow results to be recorded for FDRs other than ORF RNAs, such as ESTs, cDNAs, and noncoding RNAs, that are frequently reported for higher organisms, and allow different sets of FDRs to be registered for different organisms. The EXD query system can also be modified to automatically pipeline results to downstream analysis tools such as clustering by gene and condition. Other technical issues, such as system performance, will require ongoing management. Whereas database software and application tuning and equipment upgrades can improve ExpressDB's current performance, its current 17.5 million records, resulting from only 11 sources, clearly only represent the tiny beginnings of an anticipated flood of expression data. Over time, more efficient database technologies and algorithms will need to be explored to ensure maintenance of performance levels. Standardization of data formats and contents (see below) will also help improve performance by providing opportunities to structure the database more efficiently. More involved issues raised by data comparability concerns must be addressed through standards and additional research. Here we propose several directions for development on the basis of our results. Because these directions apply beyond of the case of yeast, we phrase them in terms of FDRs rather than ORFs Develop Methods that Will Allow Sets of Microarray-Derived Expression Data to Be Directly Compared with Each Other and with Sets of Expression Data Obtained Using Other Methodologies By dint of its flexibility, relatively low cost, and public availability, microarray technology has made a huge contribution to both the science of functional genomics as a whole and to the number of RNA expression data sets available for analysis; but the full potential of these data will not be realized until methods are developed that allow microarray-derived ratios of FDR levels in experimental conditions relative to control conditions to be easily and directly compared with microarray-derived ratios on the basis of different control conditions, and with the results of other high-throughput RNA assays. One possibility would be to encourage the development of standard microarray control conditions. If the RNA species in such standards are quantified for abundance, it would then also be possible to generate ERAs from microarray-derived data. Some ideas for this are discussed on our web site. Test Different RNA Expression Assays on Common RNA Samples to Determine Whether They Produce Equivalent Results, and Develop Standard Calibrations Where They Do Not It is not enough that expression data collected with different methodologies be expressible in a common form such as ERAs; the actual data values must be shown to be equivalent regardless of their methodology of origin. We foresee a research project in which RNA extracts from several test combinations of strains and conditions are assayed on all key RNA expression assays. Condition clustering of results by test sample regardless of assay may be one good indicator of comparability of results and of the correctness of calibrations. Protocols for sample preparation and labeling may also need to be considered, as these, too, may influence comparability of results. For instance, among Affymetrix-based experiments, Cho generated labeled double-stranded cDNAs, whereas Hol, Rot, and Coh generated labeled single-stranded cRNAs. With the cDNA protocol, cDNAs from nearby adjacent or overlapping ORFs, especially convergent ORFs with 3'-end overlaps, could hybridize to both ORF probe sets, causing signal from one ORF to be reflected in both, whereas this would not arise in the cRNA case. This could be sufficient for data gathered from the same sample RNAs using the two different protocols to segregate into different clusters. Establish Standards for Reporting Data that Cover All RNA Expression Assays On the basis of issues that arose in generating the ERA file, we
propose that researchers publish versions of data files with the
following characteristics:
Some of these suggestions reaffirm the straw man standards of
(Bassett et al. 1999 As we noted previously, improvements in data comparability through
establishment of standards for expression data collection, preparation,
and reporting will make databases more useful. We emphasize that this
pertains not just to ExpressDB but to any RNA expression database as
the fundamental issue concerns limitations on the ability to compare
data meaningfully, not the computer structures by which it may be
stored and managed. Such improvements will also help streamline and
focus databases. Taking ExpressDB as a case in point, in the absence of
such standards, ExpressDB has both too much and too little data. On the
too much side, a large number of the 2503 Measure records defined in
ExpressDB and several million associated Expression Data Point (see
Fig. 1) records are of little general scientific interest. They cannot be ignored because they are sometimes found to be essential to interpreting the data (e.g., microarray spot quality indicators); from
there, general unclarity about data fields and their potential use,
plus often sketchy documentation that makes it hard to distinguish potentially important from likely unimportant fields, offers no practical alternative to loading all reported data. On the too little
side, information that is critical to interpreting expression profiles,
especially strain and condition descriptions, is maintained in
ExpressDB only in unformatted text. As functional genomics develops, a
database will be required that maintains strain and condition
information in a structured form that can be queried precisely for such
characteristics as the presence of particular alleles in the strain,
certain compounds in the medium, or treatments of the cell culture
(e.g., heat shock). Moreover, ExpressDB's indexing of data at the RNA
level, currently being generalized from ORFs to FDRs, will require
further generalization. Not only are protein levels now being gathered
on a high throughput basis (Link et al. 1997
Database Model and Definitions The ExpressDB database model and definitions were generated using PowerDesigner 6.1 (PowerSoft, Concord, MA). The full model is available at our web site http://arep.med.harvard.edu/ExpressDB/. Database Loading EDBUpdate was written in Perl and accesses the database using
the sybperl interface. (See http://www.mbay.net/~mpeppler/ for information on sybperl.) We edited load files where necessary using
text editors and Perl scripts to put them into the required tab-delimited format with ORF names in a dedicated column. ORF names
were converted to upper case. In some cases, we eliminated records from
load files that could not be identified as representing either ORFs or
controls. To enhance queriability of the data, we converted empty
column positions in ORF rows (null values) to non-null default values,
where meaningful defaults could be clearly identified; otherwise, null
values were loaded as null database fields. To load SAGE data from Vel,
we located SAGE tag sequences in yeast genome sequence downloaded from
SGD and assigned them to ORFs on the basis of SGD ORF tables (both
sequence and tables downloaded February 15, 1999) following rules for
ORF and strand matching from (Velculescu et al. 1997 EXD Query System EXD is a collection of Common Gateway Interface (CGI) (Gundavaram
1996 Generation of ERA File Here we report methods used for generation of Affymetrix and SAGE ERAs; we discuss generation of microarray ERAs on our web site. We extracted average PM-MM values (Affymetrix) and ORF tag counts (SAGE) from ExpressDB for all Affymetrix and SAGE data sets listed in Table 1, along with any relevant qualifiers (e.g., Affymetrix feature set identifiers). We used a specially written batch extract routine for all Affymetrix database extracts; for SAGE data, we used EXD to extract only those ORFs for which counts were entirely unambiguous (minimum count = maximum count) for all conditions and only considered counts of 1 or more. We applied a standard sequence of processing steps to each individual load file, with variations as appropriate, to handle name and SGDID resolution, multiple ORF value consolidation, and Affymetrix threshold processing. We standardized ORF names and assigned SGDIDs with a program that matched load file ORF names against an extract of the Name table of a prototype version of BIGED (J. Aach and W. Rindone, unpubl.; see web site), which had been loaded with all primary and alternate ORF names and all associations between ORF names and SGDIDs published on the SGD database since August, 1998. Name resolution also detected and matched hyphenation variants for some ORF names. The name resolution program added additional columns to the output name-resolved files that preserve an audit trail of the different names consolidated to their target standardized names. For Affymetrix-derived files, the aggregate measure of the ratio of exon 1-based probe set values to exon 2-based probe set values mentioned in the text was the outlier-excluded average, over all ORFs with both exon 1 and exon 2 probe sets, of the ratio, for each such ORF, of the average over all conditions (n = 42 for Hol, n = 17 for Cho), of the average PM-MM values from an ORF's exon 1 probe set, to the corresponding average for the ORF's exon 2 probe set. We consolidated Affymetrix-based multiple probe set values for an ORF by examining Affymetrix probe set names and averaging the values of whichever group of an ORF's probe sets came first in the following sequence: (1) the probe set name is simply a gene name unqualified by exon or special feature set indicator (_i, _r, _f), (2) the probe set name is a gene name with an exon 2 designation with no special indicators, (3) the probe set name is a gene name with an exon 1 designation and no special indicators, (4) any other probe sets. Affymetrix probe set indicators such as _i, _r, and _f indicate that a probe set departs from desirable target rules for oligonucleotide probe sequence or probe set selection (Affymetrix Technical Help Desk, pers. comm.). The Rot Affymetrix-derived load file had already consolidated multiple ORF values and was exempted from this consolidation step. Following multiple ORF value consolidation, Affymetrix-derived files
with the exception of Rot were threshold adjusted to remove negative
average PM-MM values. By and large, these correspond to ORFs
considered absent by Affymetrix software (Lockhart et al. 1996 We collated all individual normalized load files into a single file
using the standardized ORF names, combining all name audit trail
information from each individual file. This file contained cleansed
intensity and SAGE count values for each ORF for each experiment, but
intensities are still on different scales for each condition. Finally,
we produced a consolidated ERA file by dividing each non-SAGE-derived
ORF intensity value by the total intensity value computed for its
column; for SAGE-derived values, we computed ERAs by dividing each ORF
count by the total number of non-rejected SAGE tags counted for the
condition (Velculescu et al. 1997 Condition Cluster Analysis We transformed ERA data in preparation for clustering using a
variant of procedures in (Tavazoie and Church 1998 We then converted non-null log10 relative abundances for each
ORF to standard units across all conditions to generate standard unit
log10 relative abundances (SULRA). We performed the
clustering of Figure 3 on Pearson correlation coefficients over ORF
SULRAs between all pairs of our 217 conditions. However, many ORFs have relative abundances at or below the level of measurement noise and
variation of their SULRAs over conditions can be expected to reflect
noise as much as change in expression level. Also, some ORFs with
higher relative abundance varied so little across conditions that
difference terms between condition levels could also reflect noise.
We therefore considered subsets of ORFs exhibiting high relative
abundance levels and evidence of significant induction or
repression as defined by two criteria: (1) the median ERA of the ORF
over all conditions >= a percentile threshold p of the median ERAs of all ORFs, and (2) at least 10% of ratios of ERAs for
the ORF over all pairs of conditions >= a threshold r,
in which this latter was evaluated by ensuring that the ratio of the
kth largest and kth smallest relative abundance
level for the ORF
We thank Pat Brown, Paul Spellman, Vishwanath Iyer, Joseph DeRisi, Michael Eisen, and Rick Young for their assistance in helping us understand data and procedures from experiments included in this analysis. Within the Church Laboratory, we thank Barak Cohen for use of unpublished data, and Barak Cohen, Rob Mitra, Saeed Tavazoie, Martin Steffen, and others, for helpful discussions on data cleansing, analysis, clustering, and for critical comments on this manuscript. We also thank four anonymous reviewers for very helpful critical comments. Finally, we thank the Lipper Foundation, Hoechst Marion Roussel, DOE grant DE-FG02-87ER60565, and Howard Hughes Medical Institute for their funding of this work.
The update of the design of ExpressDB to be organism-independent, mentioned above, is now complete. Details on the new design are on our web site.
1 Corresponding author.
E-MAIL church{at}salt2.med.harvard.edu; FAX (617) 432-7663.
Received July 21, 1999; accepted in revised form February 16, 2000. 10:431-445 ©2000 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/00 $5.00 This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||