Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 19;20(4):1071-1084.
doi: 10.1093/bib/bbx113.

MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data

MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data

Claudine Médigue et al. Brief Bioinform. .

Abstract

The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources.

Keywords: comparative genomics; gene function curation; metabolic networks; microbial genome annotation system; transcriptomics; variant detection.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Evolution of the number of integrated genomes, user accounts and expert annotations stored in MicroScope since 2002. Red scale on the right refers to the number of integrated genomes (red curve) and to the number of user accounts (orange curve). Blue scale on the left refers to the cumulated number of expert annotations.
Figure 2.
Figure 2.
Annotation pipelines for the analysis of newly sequenced genomes and genomes already annotated in public databanks.
Figure 3.
Figure 3.
Submission of genomic data into the MicroScope platform. Four types of services are provided for the integration of (i) newly sequenced or publicly available genomes (Genome), (ii) genome assemblies/bins from metagenomic samples (Metagenome), (iii) RNA-seq data for quantitative transcriptomics (RNA-Seq), (iv) DNA-seq data to identify genomic variations in evolved strains (Evolution). Following the three main steps of the procedure, the user is invited to complete the requested metadata to describe sequencing, genomes and experimental properties, to upload FASTA (genome assemblies) or FASTQ (RNA-seq or DNA-seq reads) files and, finally, to approve the terms of services. Users are then informed by an e-mail about the progress of their integration request.
Figure 4.
Figure 4.
MicroScope interface illustrating the ‘Search by keywords’ functionality. In the ‘multiple’ mode, a set of Staphylococcus species has been selected, and the BLASTP similarity results obtained with well-known resistance genes stored in the CARD database are queried using an amino acid identity threshold of at least 80% and using the keywords ‘kanamycine tetracycline’. The selection of ‘At least one word’ is required to apply an ‘OR’ between the two keywords.
Figure 5.
Figure 5.
MicroScope genome browser and synteny map. The first graphical map contains part of the genome being analyzed (here 30 kb of E. coli CFT073), over which the user can navigate (moving and zooming functionalities). The predicted coding genes are drawn, on the six reading frames, in red rectangles together with the coding prediction curves (computed with the gene model selected by the user; ‘Matrix’ selection menu). Below this genome browser, is represented the synteny map in which each line shows the similarity results between the genome being annotated (E. coli CFT073) and other selected genomes (i.e. 11 pathogenic and commensal E. coli strains; the selection is performed using the ‘Options’ functionality). On this map, a rectangle flags the existence of a gene, somewhere in the compared genome, homolog to the corresponding gene in the genome browser. If, for several co-localized CDSs on the annotated genome, there are several co-localized homologs on the compared genome, the rectangles are all of the same color; otherwise, the rectangle is white. Thus, in this map, a specific color indicates a synteny group. A rectangle is always of the same size as the reference gene in the genome browser; however, it is colored only on part of the gene, which aligns with the compared protein. This allows the user to visualize situations where the alignment is partial. There is one such case in E. coli 536 indicating that the idnK gene in this strain is a pseudogene compared with the idnK gene in CFT073. In contrast with the genome browser, there is no notion of scale on the synteny maps: to see how homologous genes are organized in a synteny group, the user can click on one rectangle in a given synteny group.
Figure 6.
Figure 6.
Comparative genomics tools of the MicroScope platform. The figure displays some of the tools available to perform in-depth comparative genomics analyses involving the bacterium of interest and one or a set of organisms: ‘Gene Phyloprofile’ (comparison of five Lactobacillus rhamnosus strains), ‘Line Plot’ (shared synteny groups found in the same DNA strand are colored in green, and in red otherwise), ‘Regions of Genomic Plasticity’ (the predicted genomic island is shown in the second layer of the circular representation), ‘Pan-core genome’ and ‘Resistome’. In this last case, the figure shows Acinetobacter baumannii AYE genes having BLASTP hits with proteins from the CARD database.
Figure 7.
Figure 7.
Tools for the analysis of microbial metabolism. Metabolic data can be explored using the KEGG or MetaCyc metabolic pathway hierarchies. On the left, the figure shows, for one selected MicroScope genome, the mapping of the annotated EC numbers on a KEGG metabolic map (enzymes encoded by genes localized on the current genome browser region are highlighted in yellow, and the ones encoded by genes localized elsewhere are highlighted in green). Predicted PGDBs using the Pathway Tools software are available using the ‘MicroCyc’ functionality. Comparison of metabolic pathways between a set of selected genomes is performed using the ‘Metabolic profiles’ tool: for each metabolic pathway, a completion value is computed, which corresponds to the number of reactions found in the genome × divided by the total number of reactions in the pathway. This value can take into account pseudogenes or not. It ranges between 0 (absence of the pathway) and 1 (complete pathway). The figure also shows an example of antiSMASH, which predicts Biosynthetic Gene Clusters in prokaryotic genomes. For the NRPS/PKS cluster types, the predicted peptide monomer composition and its corresponding SMILES formula are specified. Below the graphical representation of the predicted antiSMASH cluster, a summary of MIBiG cluster similarities, BGC gene composition as well as tailoring cluster similarities is given.
Figure 8.
Figure 8.
Technical architecture of the MicroScope platform. The MicroScope platform is made of three components: (i) a ‘Process management’ system to organize workflow execution, (ii) a ‘Data management’ system, called PkGDB, to store information from databanks, genomes and computational results and (iii) a ‘Visualization’ system for textual and graphical representation of PkGDB data.

Similar articles

Cited by

References

    1. Kersey PJ, Allen JE, Armean I, et al.Ensembl Genomes 2016: more genomes, more complexity. Nucleic Acids Res 2016;44:D574–80. - PMC - PubMed
    1. Chen I-MA, Markowitz VM, Palaniappan K, et al.Supporting community annotation and user collaboration in the integrated microbial genomes (IMG) system. BMC Genomics 2016;17:307. - PMC - PubMed
    1. Wattam AR, Davis JJ, Assaf R, et al.Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 2017;45:D535–42. - PMC - PubMed
    1. Vallenet D, Labarre L, Rouy Z, et al.MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res 2006;34:53–65. - PMC - PubMed
    1. Vallenet D, Engelen S, Mornico D, et al.MicroScope: a platform for microbial genome annotation and comparative genomics. Database 2009;2009:bap021. - PMC - PubMed

Publication types

MeSH terms