MicroBIGG-E report

MicroBIGG-E record accession, organism, location, and biosample information

MicroBIGG-E report

MicroBIGG-E record accession, organism, location, and biosample information

The downloaded MicroBIGG-E package contains a MicroBIGG-E data report in JSON lines format at the following location in the file:

ncbi_dataset/data/data_report.jsonl

Each line of the MicroBIGG-E data report file is a hierarchical JSON object that represents a single MicroBIGG-E record. The schema of the MicroBIGG-E record is defined in the tables below where each row describes a single field in the report or a sub-structure, which is a collection of fields. The outermost structure of the report is MicroBiggeReport.

Table fields that include a Table Field Mnemonic can be used with the dataformat command-line tool's --fields option. Refer to the dataformat CLI tool reference to see how you can use this tool to transform MicroBIGG-E data reports from JSON Lines to tabular formats.

Sample report

{
  "amrFinderPlus": {
    "dbVersion": "2024-07-22.1",
    "type": "COMBINED",
    "version": "3.12.8"
  },
  "amrMethod": "HMM",
  "biosample": {
    "accession": "SAMN07179453",
    "assembly": "GCA_009287105.1",
    "geographicOrigin": "United Kingdom: United Kingdom",
    "source": "human",
    "type": "clinical"
  },
  "closestReferenceSequenceComparison": {
    "accession": "BAA15331.2",
    "alignLength": 106,
    "name": "acid resistance repetitive basic protein Asr",
    "percentCoverage": 100.0,
    "percentIdentical": 58.49
  },
  "element": {
    "length": 82,
    "name": "acid resistance repetitive basic protein Asr",
    "referenceLength": 102,
    "symbol": "asr"
  },
  "location": {
    "accessionVersion": "AAMJFE010000005.1",
    "range": [
      {
        "begin": "200216",
        "end": "200464",
        "orientation": "plus"
      }
    ]
  },
  "readToAssemblyCoverage": {
    "assembly": 52,
    "contig": 44,
    "ratio": 0.846154
  },
  "subtype": "ACID",
  "targetAcc": "PDT000214120.2",
  "taxonomy": {
    "group": "Salmonella enterica",
    "scientificName": "Salmonella enterica subsp. enterica serovar Rissen"
  },
  "type": "STRESS"
}

MicroBiggeReport Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
targetAcctarget-accessionTarget accessionstring
elementElement
locationSeqRangeSetThe range of the gene
typetypeTypestringAMR
STRESS
subtypesubtypeSubtypestringAMR
METAL
classclassClassstringGLYCOPEPTIDE
COPPER/SILVER
subclasssubclassSubclassstringVANCOMYCIN
COPPER/SILVER
amrMethodamr-methodAMR methodstringEXACTP
isPlusis-plusIs plusbool
closestReferenceSequenceComparisonClosestReference
taxonomyTaxonomy
biosampleBiosample
readToAssemblyCoverageReadToAssemblyCoverage
amrFinderPlusAmrFinderPlus
genesOnContig repeatedcoming sooncoming soonstring
genesOnIsolate repeatedcoming sooncoming soonstring

AmrFinderPlus Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
versionamrfinderplus-versionAMRFinderPlus versionstring
typeamrfinderplus-typeAMRFinderPlus typestring
dbVersionamrfinderplus-db-versionAMRFinderPlus database versionstring

Biosample Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
geographicOriginbiosample-geo-originBioSample geographic originstringDenmark
not determined
sourcebiosample-sourceBioSample sourcestring
typebiosample-typeBioSample typestringclinical
environmental/other
accessionbiosample-accessionBioSample accessionstringSAMN00808999
assemblybiosample-assemblyBioSample assembly accessionstringGCA_000395725.1
collectionDatebiosample-collection-dateBioSample collection datestring

ClosestReference Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionclosest-ref-accessionClosest reference accessionstring
nameclosest-ref-nameClosest reference namestring
percentCoverageclosest-ref-pct-coverageClosest reference percent coveragefloat
percentIdenticalclosest-ref-pct-identClosest reference percent identityfloat
alignLengthclosest-ref-align-lenClosest reference alignment lengthint32

Element Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
symbolelem-symbolElement symbolstringvanS-A
copB
nameelem-nameElement namestringVanA-type vancomycin resistance histidine kinase VanS
copper/silver-translocating P-type ATPase CopB
lengthelem-lengthElement lengthint32
referenceLengthelem-ref-lengthElement reference lengthint32

Range Structure

A 1-based range on a sequence record.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
beginstartStartuint64Sequence start position
endstopStopuint64Sequence stop position
orientationorientationOrientationOrientationDirection relative to the genome
orderorderOrderuint32The position of this sequence in a group of sequences
ribosomalSlippagecoming sooncoming soonint32When ribosomal slippage is desired, fill out slippage amount between this and previous range.

ReadToAssemblyCoverage Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
contigread-assm-coverage-contigRead-to-Assembly-Coverage contiguint32
assemblyread-assm-coverage-assemblyRead-to-Assembly-Coverage assemblyuint32
ratioread-assm-coverage-ratioRead-to-Assembly-Coverage ratiofloat

SeqRangeSet Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionVersionaccessionSequence AccessionstringNCBI Accession.version of the sequence
range repeatedrange-RangeSeries of intervals on above accession_version

Taxonomy Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
grouptax-groupTaxonomic groupstringEnterococcus faecium
scientificNametax-nameTaxonomic namestringEnterococcus faecium EnGen0172

Orientation Enumeration

NameNumberDescription
none0
plus1
minus2

Scalar Value Types

Protocol buffers typeNotesC++PythonJavaGo
doubledoublefloatdoublefloat64
floatfloatfloatfloatfloat32
int32Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.int32intintint32
int64Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.int64int/longlongint64
uint32Uses variable-length encoding.uint32int/longintuint32
uint64Uses variable-length encoding.uint64int/longlonguint64
sint32Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.int32intintint32
sint64Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.int64int/longlongint64
fixed32Always four bytes. More efficient than uint32 if values are often greater than 2^28.uint32intintuint32
fixed64Always eight bytes. More efficient than uint64 if values are often greater than 2^56.uint64int/longlonguint64
sfixed32Always four bytes.int32intintint32
sfixed64Always eight bytes.int64int/longlongint64
boolboolbooleanbooleanbool
stringA string must always contain UTF-8 encoded or 7-bit ASCII text.stringstr/unicodeStringstring
bytesMay contain any arbitrary sequence of bytes.stringstrByteString[]byte
Generated February 26, 2025