ALFA FTP VCF
FTP Directory: https://ftp.ncbi.nih.gov/snp/population_frequency/
VCF Header
##fileformat=VCFv4.1
##build_id=20230706150541
##Population=https://www.ncbi.nlm.nih.gov/biosample/?term=GRAF-pop
##FORMAT=<ID=AN,Number=1,Type=Integer,Description="Total allele count for the population, including REF">
##FORMAT=<ID=AC,Number=A,Type=Integer,Description="Allele count for each ALT allele for the population">
##FORMAT=<ID=HWEP,Number=1,Type=Integer,Description="int(-log(HWE score test p-value)); -1 indicates that the HWE score test p-value could not be computed">
##FORMAT=<ID=GR,Number=1,Type=Integer,Description="Genotype homozygous reference allele (AA) count; in rare cases may not be the GRCh reference allele">
##FORMAT=<ID=GV,Number=1,Type=Integer,Description="Genotype heterozygous ref/alt (A/B) count; reported for the most common two alleles that may or may not include the reference allele">
##FORMAT=<ID=GA,Number=1,Type=Integer,Description="Genotype homozygous alternate allele (B/B) count; could be any of the non-biallelic variant alleles.">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMN10492695 SAMN10492696 SAMN10492697 SAMN10492698 SAMN10492699 SAMN10492700
SAMN10492701 SAMN10492702 SAMN11605645 SAMN10492703 SAMN10492704 SAMN10492705
NC_000007.12 1820789 rs2115066191 A G . . . AN:AC:HWEP:GR:GV:GA 38172:8058:1:11906:6302:878 36:34:0:0:2:16 56:16:1:13:14:1 1192:920:0:30:212:354 18:5:0:5:3:1 70:27:0:13:17:5 10:3:1:2:3:0 4836:1997:1:847:1145:426 450:159:3:101:89:35 1228:954:0:30:214:370 66:19:1:15:17:1 44840:11219:32:12917:7787:1716
NC_000007.12 1821039 rs2115074930 C T . . . AN:AC:HWEP:GR:GV:GA 8580:1:0:4289:1:0 38:0:-1:19:0:0 256:0:-1:128:0:0
1422:0:-1:711:0:0 32:0:-1:16:0:0 170:0:-1:85:0:0 32:0:-1:16:0:0 18:0:-1:9:0:0 52:0:-1:26:0:0 1460:0:-1:730:0:0 288:0:-1:144:0:0 10600:1:0:5299:1:0
NC_000007.12 2239462 rs2115066276 A G . . . AN:AC:HWEP:GR:GV:GA 19286:6758:0:4086:4356:1201 54:16:0:14:10:3 280:127:1:46:61:33 2106:717:0:457:475:121 188:58:0:45:40:9 2370:913:0:444:569:172 64:20:0:15:14:3 30:12:0:6:6:3 780:300:1:141:198:51 2160:733:0:471:485:124 344:147:1:61:75:36 25158:8921:0:5254:5729:1596
NC_000007.12 12564078 rs2115075302 T C . . . AN:AC:HWEP:GR:GV:GA 9950:4:0:4971:4:0 20:1:0:9:1:0 272:0:-1:136:0:0 1004:46:1:458:42:2 98:1:0:48:1:0 314:1:0:156:1:0 64:0:-1:32:0:0 30:0:-1:15:0:0 458:2:0:227:2:0 1024:47:1:467:43:2 336:0:-1:168:0:0 12210:55:16:6052:51:2
Standard VCF columns (1-9) as described in the 4.1 spec.
The CHROM and POS contain the RefSeq chrosomosome accessions and position based primariy on the latest human assembly ( GRCh38.p14). If the rs does not map to the latest assembly, it is then mapped to GRCh37 or hg18.
If the variant exists in dbSNP it will have an RefSNP(rs) ID, otherwise it's novel variant at the time the frequency was computed and specificied as '.'. Novel variants will be assigned RefSNP in the next dbSNP build and provided in future release.
QUAL, FILTER, and INFO columns have no data and are specificied with '.'.
Columns 10-21: Observed total REF allele number (AN) and ALT allele count (AC) for each of the 12 populations represented by the BioSamples ID (ie. SAMN10492695). The BioSamples ID and population descriptions are here.
The HWE score test p-value (HWEP) calculation is described in the readme.
Note:
1) Most of the sites are homoallelic for REF allele and have no frequency. These sites are detected as variants in other dbGaP studies but are not currently included in current release. Example:
NC_000001.11 10001 . T C . . . AN:AC 192:0 0:0 0:0 0:0 0:0 0:0 0:0 0:0 2:0 0:0 0:0 194:0
NC_000001.11 10007 . T C . . . AN:AC 192:0 0:0 0:0 0:0 0:0 0:0 0:0 0:0 2:0 0:0 0:0 194:0
2) The *.tbi file in the directory is created with Tabix for use with SAMtools. See details at: https://samtools.sourceforge.net/. The command options for Tabix are located at: https://samtools.sourceforge.net/tabix.shtml