Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

NCBI Minute 2023-03-29: Introduction to NCBI Pathogen Detection and antimicrobial resistance data in Google BigQuery

Resources for the NCBI Minute

Example queries used in talk

Count the isolates from enoki mushrooms

-- How many isolates were isolated from "enoki"?
SELECT COUNT(*)
FROM `ncbi-pathogen-detect.pdbrowser.isolates`
WHERE isolation_source LIKE '%enoki%'

Get info about a given isolate

-- Get info about a given isolate
SELECT taxgroup_name, isolate_identifiers, erd_group, isolation_source
FROM `ncbi-pathogen-detect.pdbrowser.isolates`
WHERE biosample_acc = 'SAMN21357979'

Find the really bad contigs with the most AMR genes

-- Find the contigs with the most AMR genes
    SELECT contig_acc, COUNT(*) num_genes, COUNT(DISTINCT(element_symbol)) num_unique_genes 
    FROM `ncbi-pathogen-detect.pdbrowser.microbigge` 
    WHERE subtype = 'AMR'
    GROUP BY contig_acc
    ORDER BY num_genes DESC
    LIMIT 10

What are the genes on those contigs?

Looks like there were maybe some errors

-- What are the genes on the contigs with the most AMR genes
SELECT mb.contig_acc, mb.start_on_contig, mb.element_symbol, mb.element_name, mb.subclass, top10.num_unique_genes, top10.num_genes
FROM `ncbi-pathogen-detect.pdbrowser.microbigge` mb
JOIN (
    SELECT contig_acc, COUNT(*) num_genes, COUNT(DISTINCT(element_symbol)) num_unique_genes 
    FROM `ncbi-pathogen-detect.pdbrowser.microbigge` 
    WHERE subtype = 'AMR'
    GROUP BY contig_acc
    ORDER BY num_genes DESC
    LIMIT 10
  ) top10 
ON top10.contig_acc = mb.contig_acc
ORDER BY num_genes DESC, start_on_contig

Isolates tested resistant to carbapenems

Here we filter based on one of the complex fields (AST_phenotypes)

--- Find all the isolates tested resistant to carbapenems
SELECT target_acc, AST_phenotypes
FROM `ncbi-pathogen-detect.pdbrowser.isolates` isolates
WHERE
(SELECT COUNT(*)
  FROM UNNEST(isolates.AST_phenotypes)
  WHERE antibiotic LIKE '%penem' AND phenotype = 'resistant'
) >= 1

Bonus, won't explain in detail, find all the isolates tested resistant to carbapenems without a known carbapenem resistance gene or point mutation

--- find all the isolates tested resistant to carbapenems without a known 
--- carbapenem resistance gene or point mutation
SELECT isolates.target_acc,
  ARRAY(select AS STRUCT antibiotic, phenotype from UNNEST(AST_phenotypes) WHERE  antibiotic LIKE "%penem") AST,
  isolates.refgene_db_version, isolates.taxgroup_name, isolates.scientific_name
FROM `ncbi-pathogen-detect.pdbrowser.isolates` isolates
LEFT JOIN `ncbi-pathogen-detect.pdbrowser.microbigge` microbigge
  ON isolates.target_acc = microbigge.target_acc
  AND microbigge.subclass = 'CARBAPENEM' -- Only carbapenem genes / point mutations
WHERE
  (SELECT count(1) FROM unnest(AST_phenotypes) AS ast
    WHERE antibiotic like "%penem" AND phenotype = 'resistant') >= 1
  AND isolates.amrfinderplus_version IS NOT NULL -- AMRFinderPlus was run on this target
  AND isolates.asm_acc IS NOT NULL -- AMRFinderPlus results should be in MicroBIGG-E because assembly is public
  AND microbigge.subclass IS NULL -- There are no rows in MicroBIGG-E with subclass = CARBAPENEM
ORDER BY isolates.target_acc

Get contig sequence for an element

Find the AMRFinderPlus results for an isolate

SELECT element_symbol, element_name, subclass, contig_acc, isolation_source, contig_url
FROM `ncbi-pathogen-detect.pdbrowser.microbigge`
WHERE biosample_acc = 'SAMN21357979'
ORDER BY contig_acc, start_on_contig