Accessing the Hidden Kingdom: Fungal ITS Reference Sequences

This post is geared toward fungi researchers as well as RefSeq and BLAST users.

Fungi have unique characteristics that can make it difficult to identify and classify species based on morphology. To address these issues, Conrad Schoch, NCBI’s fungi taxonomist, and Barbara Robbertse, NCBI’s fungi RefSeq curator, in collaboration with outside mycology experts, are curating a set of fungal sequences from internal transcribed spacer (ITS) regions of the nuclear ribosomal RNA genes. This set of standard DNA sequences for fungal taxa not only addresses these difficulties in identifying and classifying fungal species by morphology, but is also essential for analyzing environmental (metagenomics) sequencing studies. The curated ITS sequences, described in a recent article in Database (PMC Free Article), all have associated specimen data and, when possible, are taken from sequences from type materials, ensuring correct species identification and tracking of name changes. This article will show you how to access these ITS sequences and search them using the specialized Targeted Loci BLAST service.

The fungal ITS sequences are a RefSeq Targeted Loci BioProject (PRJNA177353). As you may know, a BioProject is a collection of biological data related to a single initiative; in this case, the goal is to collect and curate fungal sequences from targeted loci – specific molecular markers such as protein coding or ribosomal RNA genes used for phylogenetic analysis.

As of now, there are 2,813 sequences representing a diverse set of 2,720 fungal species. You can easily retrieve the entire set by following the link from the BioProject record or from the RefSeq Targeted Loci page, which also provides information about other rRNA Targeted Loci projects. To retrieve only the sequences from type material, add “sequence from type”[Filter] to the query provided by the BioProject link. You can also download the complete set from the genomes area of the NCBI FTP site.

The ITS reference sequences contain the 5.8S ribosomal RNA gene and the flanking internal transcribed spacer regions (ITS1 and ITS2) as well as the proximal portion of the 28S rRNA gene when available. A graphical view and feature table of the reference ITS region record (NR_111838) from Pseudogymnoascus destructans (formerly Geomyces destructans), the causative agent of white-nose syndrome in hibernating bats (PubMed), is shown in Figure 1.

Figure 1. Graphical view and feature table for the ITS Reference Sequence (NR_111838) from Pseudogymnoascus destructans, the pathogen causing white-nose syndrom ein bats. The sequence includes most of the internal transcribed spacer 1, the 5.8S ribosomal RNA gene, internal transcribed spacer 2 and the 5' end of the 28S ribosomal rRNA gene.
Figure 1. Graphical view and feature table for the ITS Reference Sequence (NR_111838) from Pseudogymnoascus destructans, the pathogen causing white-nose syndrome in bats. The sequence includes most of the internal transcribed spacer 1, the 5.8S ribosomal RNA gene, internal transcribed spacer 2 and the 5′ end of the 28S ribosomal rRNA gene.

Fungal ITS sequences are useful in identifying unknown fungal ITS targeted regions in BLAST searches. You can easily search the fungal ITS Reference Sequences on the new Targeted Loci BLAST page to quickly assign a name or find a closely related fungal species.

Select the Internal transcribed spacer region (ITS) database from the pull-down list. You can also select the checkbox of “Sequences from type material” to search only those associated with type material or cultures. Figure 2 shows the settings needed on the Targeted Loci BLAST form.

Figure 2. Settings in the "Choose Search Set" section of the Targeted Loci BLAST form for searching the RefSeq fungal ITS sequences. Checking the Sequences for type material box further restricts the set to only sequences associated with type material or cultures.
Figure 2. Settings in the “Choose Search Set” section of the Targeted Loci BLAST form for searching the RefSeq fungal ITS sequences. Checking the Sequences for type material box further restricts the set to only sequences associated with type material or cultures.

Targeted Loci BLAST page is especially helpful when a search of the default database on the main BLAST page finds best matches to environmental fungal sequences that have incomplete taxonomic information. Figure 3 shows the results of a BLAST search against the RefSeq Fungal ITS sequences using an uncultured fungus ITS clone sequence (INSDC Accession: DQ421263) as a query. The best hit is the Penicillium subrubescens CBS 132785 ITS sequence (NR_111863) with a single mismatch in the alignment.

Run the search against fungal ITS sequences.

Figure 3. Results of a Targeted Loci BLAST search against the RefSeq ITS fungal sequences from type. The query is DQ421263, an uncultured fungal ITS sequence from a soil sample. Top panel. The first five BLAST matches in the BLAST Descriptions section. These are all from Penicillium species. Bottom  panel. The two best alignments in the output. The query sequence most likely is from the ascomycete species Penicillium subrubescens. The BLAST formatting options were set so that the alignment view is "Pairwise with dots for identities" to highlight sequence differences.
Figure 3. Results of a Targeted Loci BLAST search against the RefSeq ITS fungal sequences from type. The query is DQ421263, an uncultured fungal ITS sequence from a soil sample. Top panel. The first five BLAST matches in the BLAST Descriptions section. These are all from Penicillium species. Bottom panel. The two best alignments in the output. The query sequence most likely is from the ascomycete species Penicillium subrubescens. The BLAST formatting options were set so that the alignment view is “Pairwise with dots for identities” to highlight sequence differences.

The same search against the default nucleotide database, even with the exclude “Uncultured/environmental sample sequences” box selected, finds a large number of incompletely identified records that push the Penicillium subrubescens ITS sequence hit to position 70 in the output (now show), making it difficult to assign the most likely source organism for the unidentified query sequence.

Run the search against the default nucleotide database.

The RefSeq fungal ITS sequences are an essential resource for fungal phylogenetic studies and analysis and identification of fungal sequences from environmental sequencing projects. The linkage to type materials makes them particularly valuable for assigning accurate names. Currently, RefSeq records represent most of the fungal Orders. NCBI curators will continue to expand the set to improve the coverage at the Family and Genus levels.

One thought on “Accessing the Hidden Kingdom: Fungal ITS Reference Sequences

Leave a Reply