1887

Microbial Genomics: Standing on the Shoulders of Giants

Professor Sir David Hopwood, part III


Sixty years of Streptomyces genetics: from Petri dish to computer

Professor Sir David Hopwood is a pioneering microbiologist, geneticist and researcher into the biology of streptomycetes, the bacteria that produce the majority of antibiotics in clinical use around the world today.

From graduating with a degree in botany at Cambridge University, to a PhD in microbial genetics, to co-ordinating the sequencing of the genome of Streptomyces coelicolor, the largest microbial genome to be sequenced at the time, David Hopwood has had a momentous and fascinating career interacting with many great scientists and promoting the development of microbial genomics.

 

Part III – The first Streptomyces genome sequence at the turn of the millennium: glimpses of Aladdin’s cave

In the first half of the 1990s, genetics took a giant leap forward with the possibility of obtaining the complete genome sequence, potentially of any organism. Publication in July 1995 of the genome sequence of the bacterium Haemophilus influenzae by a group headed by Craig Venter at The Institute for Genomics Research (TIGR) in Rockville, USA, provided the first complete catalogue of the genetic potential of a free-living organism and heralded the dawn of the genomics era, which now permeates the whole of biology. It was soon followed by genome sequences for other microbes, including several important bacteria, as well as the model eukaryote, the yeast Saccharomyces cerevisiae.

We desperately wanted a genome sequence for Streptomyces coelicolor, but faced an enormous challenge to raise the required funding. The European Union seemed the best bet but in the summer of 1994 we learned that the vast majority of EU funds allocated to genome sequencing would be reserved for Bacillus, yeast and the model plant, Arabidopsis, with only a possible 6–8% available for other organisms where a pressing case could be made. The European Streptomyces community started to discuss the possibility of a bid, and John Cullum at the University of Kaiserslautern coordinated and submitted the bid at the end of 1995. After much heartache the application failed. In retrospect this was certainly a good thing. If the work had been done in many different labs, some inexperienced in significant sequencing, progress would have been slow and variable, and annotation of the sequence would have been patchy at best. In any case, in order for the bid to be financially realistic it was intended to cover only about one eighth of the S. coelicolor genome. It soon became apparent that the best way forward was to try to obtain enough funding to commission sequencing of the whole genome by a single professional organisation. Craig Venter would have been very happy for TIGR to do it, but the price tag was $4.15 million, which we didn’t have.

Meanwhile, a crucial research project came to fruition with the publication, in July 1996, of the results of years of painstaking work on the genome of S. coelicolor. Through Helen Kieser’s heroic efforts we had established a gross physical map of the chromosome, based on the ordering of large DNA fragments generated by rare-cutting restriction enzymes and separated by pulsed-field gel electrophoresis, and related it to the genetic linkage map built up over the decades. Carton Chen in Taipei had made the revolutionary discovery that the chromosome is linear, not, as we had believed ever since my mapping studies in the 1960s, circular like those of the vast majority of bacteria. And Matthias Redenbach in John Cullum’s lab had generated a population of cloned fragments of the genome in a cosmid vector and led a project to identify a subset of 319 of them to cover the whole chromosome with minimal overlaps, to which Helen had mapped many previously cloned genes. To do so she had obtained DNA samples suitable for hybridisation to the cosmid set from the world-wide Streptomyces community.

Availability of this set of overlapping cosmids and the impressive combined genetic and physical map (Figure 1) turned out to be crucial in lobbying for support for the goal of sequencing the genome. Committees and working parties of the BBSRC, the main source of support for non-medical biological research in the UK, held several discussions on their approach to genomics funding, to which I contributed, culminating in what seemed an almost surreal outcome in March 1997: that the whole budget of £1.5 million set aside for microbial genome sequencing would be allocated to our organism! In fact it was a rational decision, though I say it myself. The organism had both academic and industrial relevance; there was an active and well-coordinated community of UK (and worldwide) scientists ready to exploit the sequence; and the cosmid set would allow the sequencing to proceed in manageable steps, rather than by shotgun sequencing of the whole genome, which was deemed too big to assemble reliably with the computing power then available at the Sanger Institute in Cambridge, where the sequencing would be done.

Sequencing of the first cosmids started in July 1997 and the project was completed exactly four years later, in July 2001. The original BBSRC contract envisaged funding for 70% of the genome over three years, but luckily the Wellcome Trust came up with an additional grant to complete the sequence. During the project, the Sanger Institute’s involvement in a race with Craig Venter to sequence the human genome took up much of their time and resources, hence the extra year. Stephen Bentley, Co-Editor-in-Chief of Microbial Genomics, was the main annotator as he painstakingly and insightfully interrogated each cosmid for its secrets, and posted the results on the Sanger website. The piecemeal nature of the project meant that some of those keenly interested in a particular gene or other feature of the chromosome were rewarded early in the project while others had to wait much longer for their favourite cosmid to be sequenced, but finally the task was complete. The buzz of excitement from the worldwide Streptomyces community was audible and it was one of the most satisfying periods of my research career – even if I had formally retired by then! We felt like Aladdin as he entered a magical new world to explore, as told in Richard Burton’s classic 19th-century translation of The Arabian Nights: “Aladdin walked among the trees and gazed upon them and other things which surprised the sight and bewildered the wits; and, as he considered them, he saw that in lieu of common fruits the produce was of mighty fine jewels and precious stones, such as emeralds and diamonds; rubies, spinels and balasses, pearls and similar gems astounding the mental vision of man.”

The first and most obvious finding from the genome sequence was the large number of potential genes it contained. We had known from Helen’s physical mapping that the genome was about 8 Mb in size (it came in at 8,667,507 base pairs), around twice those of E. coli, Bacillus subtilis and Mycobacterium tuberculosis, but large segments of the genome near its ends had been virtually devoid of any of the genes for metabolic pathways, developmental events and basic housekeeping functions that had been identified by the isolation of typical mutant classes. And studies from several labs over the years had shown that streptomycete genomes can undergo deletion of megabase-sized segments of their genomes without loss of viability under laboratory conditions. Did such “silent” regions consist of “junk” or other non-coding DNA? No, the whole genome was equally rich in potential coding sequences, making 7,825 in all. This was the largest number for any microbe sequenced to date and, most surprisingly, considerably more than the estimated 5,000 or so genes in the eukaryotic yeast genome. Why so many?

Soil is a hugely complex habitat, with a vast range of chemical, physical and biotic challenges. One of the most striking features of the inventory of predicted gene functions was the huge number of genes whose products would fit the organism for its habitat. Thus, 12.3% of predicted proteins had regulatory functions, compared with much smaller proportions for microbes from more restricted habitats, including yeast. As just one example, within this category, while E. coli has just one gene encoding the extra-cytoplasmic function (ECF) class of RNA polymerase sigma factors responsible for tuning transcription of sets of genes to events going on outside the cell out of its total of seven sigma factors, S. coelicolor has 45 from a total of 65 predicted sigma factors. This gamut of regulatory genes would allow the organism to express many different subsets of genes depending on circumstances. Again, the ability of S. coelicolor to exploit nutrients in the ever-varying soil environment was evident from the prediction of 10.5% of genes encoding potentially secreted proteins, including many proteases, chitinases, cellulases and amylases.

One of the most striking features of the architecture of the chromosome was its apparent division into a more or less central “core” region representing about half the genome and two “arms”, including the ends of the linear structure (Figure 2). Nearly all genes expected to be unconditionally essential lie in the core, including those for cell division, DNA replication, transcription, translation and amino acid biosynthesis, while genes whose products would be expected to be adaptive only sporadically, like those for specialised metabolites and secreted enzymes, tend to be in the arms. This accounted for the “silent” regions of the original genetic map. Most revealingly, when the positions of putative gene homologues in S. coelicolor and the distantly related pathogenic actinobacterium M. tuberculosis were compared, there was a recognisable synteny between the core of the Streptomyces chromosome and the entire Mycobacteriumem genome, leaving the arms with no clear synteny; perhaps their genes have been acquired since divergence from a common actinobacterial ancestor. More recently, with the sequencing of many other streptomycete genomes, it is clear that the arm regions are much more divergent than the central core, suggesting that they reflect the accumulation of genes useful for specific adaptation, and the finding of many indications of transposon activity in the arms supports the idea of horizontal transfer and liquidity of these regions, rather like the plasmid pools of other groups of bacteria.

But what about the unrivalled capacity of the streptomycetes to make specialised, or “secondary”, metabolites, which include the antibiotics, insecticides and cytotoxic (anticancer) agents? This had undoubtedly been a major factor in acceptance of the manuscript describing the genome sequence by Nature magazine and their featuring one of Tobias Kieser’s beautiful pictures of S. coelicolor colonies making the blue actinorhodin antibiotic, on the cover of the 9 May 2002 issue with the caption, "Genome of an antibiotics factory" (Figure 3). We had previously identified the gene clusters for three antibiotics and a spore pigment, but analysis of the genome sequence by Greg Challis of the University of Warwick, one of the authors on the Nature paper, revealed a probable 18 other clusters of genes that looked likely to encode biosynthetic pathway enzymes for interesting compounds. Strikingly, sequencing of the genome of Streptomyces avermitilis by Satoshi mura’s group at the Kitasato Institute in Tokyo, in parallel with our S. coelicolor project, revealed even more such clusters. It appeared that these soil-dwelling bacteria have a much greater capacity to make antibiotics and other useful compounds than they reveal under typical laboratory conditions. Could exploitation of this finding revolutionise the discovery of novel antibiotics in these times of ever-increasing resistance to currently used drugs?

Since 2002, information about the potential of Actinobacteria (as well as of other microbes, including filamentous fungi) to produce interesting natural products has exploded, made possible by the invention of revolutionary new rapid and affordable sequencing methods. While precise estimates are impossible to obtain, partly because of commercial considerations, there is no doubt that many hundreds of actinobacterial genomes have been sequenced, not to the degree of completeness of the first two sequences, but sufficiently comprehensively to reveal most of their potential specialised metabolite gene clusters. The average number of these per genome is probably around 30, mostly different from genome to genome so far, making a total count of clusters of well over 10,000 already. And the surface has only been scratched. It has been estimated that about 15,000 genome sequences will be needed to identify all the clusters, with over a million actinobacterial natural products still to be discovered. As with S. coelicolor, the vast majority of the clusters are “cryptic” or “sleeping” in cultivation and adaptive, and therefore expressed only under particular environmental conditions, so many groups are addressing the problem of waking them up to make their products accessible to characterisation chemically and biologically. Successful approaches include both environmental and genetic interventions: varying the medium composition and growth parameters, including adding agents likely to influence differential gene expression, challenging the organism by co-cultivation with another microbe, or over-expressing a pathway-specific transcriptional activator. The outcome has been a gamut of novel chemistry leading to structural novelty and, potentially, new biological activity. As with any antibiotic discovery approach, only a small minority of new compounds will make it to the clinic, but I am not the only one to be optimistic that these genomics-based strategies will have significant success.

Thus concludes my three-part narrative about my take on the course of Streptomyces genetics over a 60-year span. It has been an exciting journey, made infinitely enjoyable by all the friendships and interactions with a wonderful community of fellow scientists – in my group, in the wider institute and in laboratories around the world. From my vantage point as an Emeritus Fellow at the John Innes Centre, I continue to be excited by their discoveries; long may they continue.


Figure 1. The combined physical map of the S. coelicolor chromosome based on pulsed-field gel electrophoresis of DNA fragments generated by the two rare-cutting restriction enzymes AseI and DraI and hybridisation of cloned genes to the set of overlapping cosmids covering the whole chromosome (with three short gaps). From the cover of Molecular Microbiology 21, Volume 1, July 1996, reproduced with permission from John Wiley & Sons. 1999–2016 John Wiley & Sons, Inc.

Figure 2. The Streptomyces coelicolor chromosome from the complete annotated genome sequence. The outermost circle indicates the core (dark blue) and arm (light blue) regions of the chromosome. Next, from the outside in, come all genes on the reverse and forward strands colour-coded by function, followed by selected ‘essential’ genes, for cell division, DNA replication, transcription, translation and amino-acid biosynthesis. Note that these genes are virtually all confined to the core region. Other features of the genome are shown on further circles of the figure. From Bentley et al. 2002, Nature 417, 141-147 reproduced with permission. Reprinted with permission from Macmillan Publishers Ltd: from ‘Complete genome sequence of the model actinomycete: Streptomyces coelicolor: A3(2)’ by Bentley et al. 141–147, Nature Volume 417 (141-147); 9 May 2002. (http://www.nature.com/nature/journal/v417/n6885/full/417141a.html)


Figure 3. The cover of the 9 May 2002 issue of Nature reproduced with permission from Macmillan Publishers.



This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error