Genome Research
Vol. 7, No. 8,
pp. 802-819,
August 1997
RESEARCH
Multiplex Sequencing of 1.5 Mb of the Mycobacterium
leprae Genome
Douglas R.
Smith,1,2
Peter
Richterich,2
Marc
Rubenfield,2
Philip W.
Rice,2
Carol
Butler,2
Hong-Mei
Lee,2
Susan
Kirst,2
Kristin
Gundersen,2
Kari
Abendschan,2
Qinxue
Xu,2
Maria
Chung,2
Craig
Deloughery,2
Tyler
Aldredge,2
James
Maher,2
Ronald
Lundstrom,2
Craig
Tulig,2
Kathleen
Falls,2
Joan
Imrich,2
Dana
Torrey,2
Marcy
Engelstein,2
Gary
Breton,2
Deepika
Madan,2
Raymond
Nietupski,2
Bruce
Seitz,2
Steven
Connelly,2
Steven
McDougall,2
Hershel
Safer,2
Rene
Gibson,2
Lynn
Doucette-Stamm,2
Karin
Eiglmeier,5
Staffan
Bergh,5
Stewart T.
Cole,5
Keith
Robison,4
Laura
Richterich,4
Jason
Johnson,4
George M.
Church,1,3,4 and
Jen-i
Mao2
2 Genome Therapeutics Corporation, Collaborative Research
Division, Waltham, Massachusetts 02154;
3 Howard Hughes
Medical Institute and
4 Department of Genetics, Harvard
Medical School, Boston, Massachusetts 02115;
5 Unite de
Genetique Moleculaire Bacterienne, Institut Pasteur,
75724 Paris CEDEX 15, France
The nucleotide sequence of 1.5 Mb of genomic DNA from
Mycobacterium leprae was determined using computer-assisted
multiplex sequencing technology. This brings the 2.8-Mb M. leprae genome sequence to ~66% completion. The sequences,
derived from 43 recombinant cosmids, contain 1046 putative
protein-coding genes, 44 repetitive regions, 3 rRNAs, and 15 tRNAs. The
gene density of one per 1.4 kb is slightly lower than that of
Mycoplasma (1.2 kb). Of the protein coding genes, 44% have
significant matches to genes with well-defined functions. Comparison of
1157 M. leprae and 1564 Mycobacterium tuberculosis
proteins shows a complex mosaic of homologous genomic blocks with up to
22 adjacent proteins in conserved map order. Matches to known
enzymatic, antigenic, membrane, cell wall, cell division, multidrug
resistance, and virulence proteins suggest therapeutic and vaccine
targets. Unusual features of the M. leprae genome include
large polyketide synthase (pks) operons, inteins, and highly fragmented
pseudogenes.
[The sequence data described in this paper
have been submitted to GenBank under accession nos. L78811-L78829,
U00010-U00023, U15180-U15184, U15186, U15187, L01095, L01536, L04666,
and L01263. On-line supplementary information for Table 1 is available
at http://www.cshl.org/gr.]
7:802-819 ©1997 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/97 $5.00