U.S. flag

An official website of the United States government

VCF INFO tag terms for dbSNP

This document describes the Variant Call Format(VCF) file tags used by NCBI dbSNP to output variant information. It includes the abbreviations used in the INFO column of dbSNP VCF files.

dbSNP VCF files are available at the FTP repository ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF. We release a new set of VCF files for each build, and each reporting assembly.

The VCF was developed as a standardized format for storing large quantities of data. Each VCF file contains a header section and a data table section. It is not necessary to list the properties that a site does not have. See also  VCF v.4.1 for detailed information regarding the use of the INFO tags in VCF.

AF

VCF INFO.  Provides the Allele Frequency for each allele in the ALT column, in the same order as listed.  Please note that is is depricated in the build b138 VCF files.  Since AF tag is also used in 1000G VCF with different meaning.  Starting with the b138 VCF files, allele frequencies are repoted in the CAF tag.

ASS

VCF INFO. Indicates the variant is located in an acceptor splice site. The functional code (FxnCode) of such a variation = 73.

ASP

VCF INFO. Indicates the variant "is assembly specific". This flag is set if the variant maps to only one assembly.

CAF

VCF INFO. Comma delimited list of allele frequencies based on 1000Genomes.  The first frequency refers to the reference base, and alternate alleles follow in the order as in the ALT column.  Where 1000Genomes alternate allele is not in the dbSNPs alternate lalel set, the allele is added to the ALT column.  The minor allele is the second largest value in the list, and was previously reported as the GMAF in VCF, RefSNP and EntrezSNP pages and VariationReporter. This GMAF will be called 1000G MAF  starting from b138.

CDA

VCF INFO. Indicates the variation is interrogated in a clinical diagnostic assay.

CFL

VCF INFO. Indicates the variant has an assembly conflict.  This flag is set for weight 1 and 2 variants that map to different chromosomes on different assemblies.

CLNACC

VCF INFO.  A string that is the accession and version number assigned by ClinVar to the genotype/phenotype relationship.

CLNALLE

VCF INFO. An integer that defines the alleles in the  REF or ALT columns.  0 is REF, 1 is the first ALT allele, etc. A value of -1 indicates that no allele was found to match a corresponding HGVS allele name. Data order is maintained in the clinvar.vcf.gz file by the CLNALLE tag, since CLNALLE provides an ordered list of the alleles described by the clinical (CLN*) INFO tags that follow it.  A user can match the ordered list of alleles in the CLNALLE tag to their corresponding clinical data in the other clinical (CLN*) INFO tags, since these clinical data are listed in the same order that the CLNALLE tag lists the alleles.  See the examplein variation FAQ number 8for more details about how the CLNALLE tag allows for data matching.

CLNCUI

VCF INFO. A string that is the disease concept ID used in GTR and ClinVar for a phenotype associated with an allele.

CLNDBN

VCF INFO. A string that is the disease name used by the database specified by CLNSRC.

CLNHGVS

VCF INFO. A string that describes the variant names from HGVS. The order of these variants corresponds to the order of the info in the other clinical (CLN) INFO tags.

CLNORIGIN

VCF INFO. A string that describes the origin of the variant allele. One or more of the values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other

CLNSIG

VCF INFO.  A string that describes the variant's clinical significance, where  0 - unknown, 1 - untested, 2 - non-pathogenic, 3 - probable-non-pathogenic, 4 - probable-pathogenic, 5 - pathogenic, 6 - drug-response, 7 - histocompatibility, 255 - other.

CLNSRC

VCF INFO.  A string that describes the Variant's Clinical Sources.

CLNSRCID

VCF INFO. The identifier used for the allele from the source defined in CLNSRC.

dbSNPBuildID

VCF INFO. An integer that indicates the first dbSNP Build in which the rs was reported

DSS

VCF INFO. The variant is located in an donor splice site. The functional code (FxnCode) of such a variation = 75.

G5

VCF INFO. Indicates the variant has a >5% minor allele frequency in one or more  populations.

G5A

VCF INFO. Indicates the variant has a >5% minor allele frequency in each and all populations.

GCF

VCF INFO. Indicates the variant has a Genotype Conflict  Same (rs, ind), different genotype.  N/N is not included. 

GENEINFO

VCF INFO. Report of the gene symbol(s) and NCBI GeneID(s) at the location of the variation.  The gene symbol and ID are delimited by a colon (:) and each pair is delimited by a vertical bar (|).  Example: SYMBOl1:GeneID1|SYMBOl2:GeneID2| .

GMAF

VCF INFO. A floating decimal that indicates the Global Minor Allele Frequency [0, 0.5]. The global population is 1000GenomesProject phase 1 genotype data from 629 individuals, released in the 11-23-2010 dataset.

GNO

VCF INFO. Indicates there are genotypes available for the variant and that the variant has an individual genotype (in SubInd table).

HapMap Populations

Term used in the context of the dbSNP VCF files (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF).  It is the initial set of 12 HapMap populations used in the "ByPopulation/" and "ByChromosome/" subfolders files and include the populations listed below. For a complete description of each population, click on the population name link.

HapMap-ASW HapMap-CEU HapMap-CHB HapMap-CHD HapMap-GIH HapMap-HCB HapMap-JPT HapMap-LWK  HapMap-MEX HapMap-MKK HapMap-TSI HapMap-YRI

HD

VCF INFO. Indicates a marker for the variation is in a high density genotyping kit (50K density or greater);  The variant may have phenotype associations present in dbGaP.

INT

VCF INFO. Indicates the variant is located in an intron based on NCBI's annotation. The functional code (FxnCode) of such a variation = 6.

KGPhase1

VCF INFO. Indicates the variation was in 1000 Genome phase 1 (including June Interim phase 1).

KGPilot123

VCF INFO. Indicates the variation was in the 1000 Genome discovery -- all 2010 pilots (1,2,3).

KGPROD

VCF INFO. Indicates the variation was submitted as part of the 1000 Genomes Project.

KGValidated

VCF INFO. Indicates the variation was validated by 1000 Genomes.

LSD

VCF INFO. Indicates the variation was submitted from a locus-specific database (LSDB).

Maternal

The allele in the proband that originated from the mother.

MTP

VCF INFO. Indicates the variation has microattribution/third-party annotation(TPA:GWAS,PAGE). 

MUT

VCF INFO. Indicates the allele has a low frequency and is cited in a journal or other reputable sources.

NOC

VCF INFO. Indicates the allele in the genome is not present in variant allele list. The reference sequence allele at the mapped position is not present in the variant allele list, adjusted for orientation.

NOV

VCF INFO. Indicates the rs cluster has non-overlapping allele sets. This is true when an rs set has more than 2 alleles from different submissions and these sets share no alleles in common.

NS

VCF INFO. The number of samples with data for this variant.

NSF

VCF INFO. The consequence of the variation is a non-synonymous frameshift -- a coding region variation where one allele in the set changes all downstream amino acids. The functional class (FxnClass) of such a variation = 44.

NSM

VCF INFO. The consequence of the variation is a non-synonymous (missense) change -- it is a coding region variation where one allele in the set changes the amino acid, but translation continues. The functional class (FxnClass) of such a variation = 42.

NSN

VCF INFO. The consequence of the variation is a non-synonymous stop codon (nonsense) -- it is a coding region variation where one allele in the set changes to a STOP codon (TER or *). The functional class (FxnClass) of such a variation = 41.

OM

VCF INFO. Indicates the variation has a record in OMIM or OMIA.

OTH

VCF INFO. Indicates there is another variant with exactly the same set of mapped positions on NCBI reference assembly. x

OTHERKG

VCF INFO. Indicates the variation was not included in the 1000 Genome submission.

Paternal

The allele in the proband that originated from the father.

PH3

VCF INFO. Indicates the variation was HAP_MAP Phase 3 genotyped: filtered, non-redundant.

PM

VCF INFO. Indicates that the variant is from a clinical channel or is cited in PubMed (PM).

PMC

VCF INFO. Indicates that links exist from the variant's rs record to a PubMed Central article.

POPFREQ

VCF INFO. A string that gives the frequencies and count of the ALT alleles by population ID.  The form is p(na/ns):f(c1/c2)[|f(c1/c2)]...  where: p is the pop id; na is the number of alleles for the population; ns is the number of samples for the population; f is the frequency; c1 is the allele count; and c2 is sample count for that allele (c1 - homozygous count).  The populations ID, names, and handles shown above in dbSNP_POP_IDS, dbSNP_LOC_POP_IDS, and dbSNP_POP_HANDLES, respectively in corresponding order.

R3

VCF INFO. Indicates the variant is located in a 3' gene region. The functional code (FxnCode) of such a variation = 13.

R5

VCF INFO. Indicates the variant is located in a 5' gene region. The functional code (FxnCode) of such a variation = 15.

REF

VCF INFO. Indicates the variant "has reference". That is, it is a coding region variation where one allele in the set is identical to the reference sequence. The functional code (FxnCode) of such a variation = 8.

RSPOS

Chromosome position based on dbSNP submission. Note that for insertion or deletion variants, the value in RSPOS is different from the value in VCF POS column. VCF POS is the leftmost position of the variant on the reference sequence identified in CHROM. In addition, VCF POS is the position of the base before the insertion or deletion, and therefore may be 5' upstream from the variant location in RSPOS and the location represented by HGVS notation, which uses the rightmost position. For that reason, VCF POS and alleles may look different from HGVS in position and sequence, as in the example rs398124296: "1 45974693 rs398124296 CAGA C .... CLNHGVS=NC_000001.10:g.45974696_45974698delAAG...".

RV

VCF INFO. Indicates that the variation's "RS orientation is reversed".

S3D

VCF INFO. Indicates that the variant has 3D structure: SNP3D table.  Note: "S3D" is "SNP 3 dimensional".  We have changed the usage of "SNP" to the more inclusive term "variant"; the tag "S3D" remains in the vcf files.

SAO

VCF INFO. An integer that indicates variant allele origin.  The accepted values for this tag are: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both.  Note: "SAO" is "SNP Allele Origin".  We have changed "SNP" to the more inclusive term "variant"; the tag "SAO" remains in the vcf files.

SLO

VCF INFO. Indicates that  the variant's rs has a submitter provided "LinkOut" to the submitter's web site.

SSR

A VCF INFO tag that can be found in all records whose value is an integer that indicates the variant suspect reason code.  The accepted values for this tag are: 0 - unspecified, 1 - Paralog, 2 - byEST, 3 - Para_EST, 4 - oldAlign, 5 - other.  Note: "SSR" is "SNP Suspect Reason".  We have changed "SNP" to the more inclusive term "variant"; the tag "SSR" remains in the VCF files.

SYN

VCF INFO. indicates the variant "has synonymous". That is, it is a coding region variation where one allele in the set does not change the encoded amino acid. The functional code (FxnCode) of such a variation = 3.

Tested - inconclusive (origin)

An attempt was made to determine the origin of the variant allele, but results were inconclusive.

TPA

VCF INFO. The variant has provisional Third Party Annotation (TPA). This set is currently restricted to rs from PharmGKB, which provides phenotype data.

U3

VCF INFO. The variant is located in a 3' untranslated region (UTR). The functional code (FxnCode) of such a variation = 53.

U5

VCF INFO. The variant is located in a 5' untranslated region (UTR). The functional code (FxnCode) of such a variation = 55.

VLD

VCF INFO. The variant is validated. This flag is set if a variant has 2+ minor allele count based on frequency or genotype data.

VP

VCF INFO. A string that describes a "Variation Property" of the variant. 

VC

VCF INFO. A string that describes the "Variation Class" of the variant.

WGT

VCF INFO. A integer that indicates variant map weight. The accepted values for this tag are: 00 - unmapped, 1 - map weight 1; 2 - map weight 2;  3 - map weight 3 or more.

WTD

VCF INFO. Indicates if one submitter of information about variation at this location was "withdrawn by submitter". The flag is set if one ss member of the rs cluster for a variant is withdrawn by the submitter.  If all ss members of the rs cluster are withdrawn, then the variant rs would be deleted to "SNPHistory" and not reported.

Support Center

Last updated: 2017-11-14T16:52:56Z