MicroBIGG-E report
MicroBIGG-E record accession, organism, location, and biosample information
The downloaded MicroBIGG-E package contains a MicroBIGG-E data report in
JSON lines
format at the following location in the file:
ncbi_dataset/data/data_report.jsonl
Each line of the MicroBIGG-E data report file is a hierarchical JSON
object that represents a single MicroBIGG-E record. The schema of the MicroBIGG-E record is defined in the tables below
where each row describes a single field in the report or a sub-structure, which is a collection of fields.
The outermost structure of the report is MicroBiggeReport.
Table fields that include a Table Field Mnemonic can be used with the
dataformat command-line tool's
--fields
option. Refer to the
dataformat CLI tool reference to see how
you can use this tool to transform MicroBIGG-E data reports from JSON Lines to tabular formats.
Sample report
{
"amrFinderPlus": {
"dbVersion": "2024-07-22.1",
"type": "COMBINED",
"version": "3.12.8"
},
"amrMethod": "HMM",
"biosample": {
"accession": "SAMN07179453",
"assembly": "GCA_009287105.1",
"geographicOrigin": "United Kingdom: United Kingdom",
"source": "human",
"type": "clinical"
},
"closestReferenceSequenceComparison": {
"accession": "BAA15331.2",
"alignLength": 106,
"name": "acid resistance repetitive basic protein Asr",
"percentCoverage": 100.0,
"percentIdentical": 58.49
},
"element": {
"length": 82,
"name": "acid resistance repetitive basic protein Asr",
"referenceLength": 102,
"symbol": "asr"
},
"location": {
"accessionVersion": "AAMJFE010000005.1",
"range": [
{
"begin": "200216",
"end": "200464",
"orientation": "plus"
}
]
},
"readToAssemblyCoverage": {
"assembly": 52,
"contig": 44,
"ratio": 0.846154
},
"subtype": "ACID",
"targetAcc": "PDT000214120.2",
"taxonomy": {
"group": "Salmonella enterica",
"scientificName": "Salmonella enterica subsp. enterica serovar Rissen"
},
"type": "STRESS"
}
MicroBiggeReport Structure
AmrFinderPlus Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
version | amrfinderplus-version | AMRFinderPlus version | string | | |
type | amrfinderplus-type | AMRFinderPlus type | string | | |
dbVersion | amrfinderplus-db-version | AMRFinderPlus database version | string | | |
Biosample Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
geographicOrigin | biosample-geo-origin | BioSample geographic origin | string | | Denmark
not determined
|
source | biosample-source | BioSample source | string | | |
type | biosample-type | BioSample type | string | | clinical
environmental/other
|
accession | biosample-accession | BioSample accession | string | | SAMN00808999
|
assembly | biosample-assembly | BioSample assembly accession | string | | GCA_000395725.1
|
collectionDate | biosample-collection-date | BioSample collection date | string | | |
ClosestReference Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
accession | closest-ref-accession | Closest reference accession | string | | |
name | closest-ref-name | Closest reference name | string | | |
percentCoverage | closest-ref-pct-coverage | Closest reference percent coverage | float | | |
percentIdentical | closest-ref-pct-ident | Closest reference percent identity | float | | |
alignLength | closest-ref-align-len | Closest reference alignment length | int32 | | |
Element Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
symbol | elem-symbol | Element symbol | string | | vanS-A
copB
|
name | elem-name | Element name | string | | VanA-type vancomycin resistance histidine kinase VanS
copper/silver-translocating P-type ATPase CopB
|
length | elem-length | Element length | int32 | | |
referenceLength | elem-ref-length | Element reference length | int32 | | |
Range Structure
A 1-based range on a sequence record.
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
begin | start | Start | uint64 | Sequence start position | |
end | stop | Stop | uint64 | Sequence stop position | |
orientation | orientation | Orientation | Orientation | Direction relative to the genome | |
order | order | Order | uint32 | The position of this sequence in a group of sequences | |
ribosomalSlippage | coming soon | coming soon | int32 | When ribosomal slippage is desired, fill out slippage amount between this and previous range. | |
ReadToAssemblyCoverage Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
contig | read-assm-coverage-contig | Read-to-Assembly-Coverage contig | uint32 | | |
assembly | read-assm-coverage-assembly | Read-to-Assembly-Coverage assembly | uint32 | | |
ratio | read-assm-coverage-ratio | Read-to-Assembly-Coverage ratio | float | | |
SeqRangeSet Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
accessionVersion | accession | Sequence Accession | string | NCBI Accession.version of the sequence | |
range repeated | range- | | Range | Series of intervals on above accession_version | |
Taxonomy Structure
Field | Table Field Mnemonic | Table Column Name | Type | Description | Examples |
---|
group | tax-group | Taxonomic group | string | | Enterococcus faecium
|
scientificName | tax-name | Taxonomic name | string | | Enterococcus faecium EnGen0172
|
Orientation Enumeration
Name | Number | Description |
---|
none | 0 | |
plus | 1 | |
minus | 2 | |
Scalar Value Types
Protocol buffers type | Notes | C++ | Python | Java | Go |
---|
double | | double | float | double | float64 |
float | | float | float | float | float32 |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | int/long | long | int64 |
uint32 | Uses variable-length encoding. | uint32 | int/long | int | uint32 |
uint64 | Uses variable-length encoding. | uint64 | int/long | long | uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | int/long | long | int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | int/long | long | uint64 |
sfixed32 | Always four bytes. | int32 | int | int | int32 |
sfixed64 | Always eight bytes. | int64 | int/long | long | int64 |
bool | | bool | boolean | boolean | bool |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | str/unicode | String | string |
bytes | May contain any arbitrary sequence of bytes. | string | str | ByteString | []byte |
Generated February 26, 2025