A variation call set obtained from the analysis of Gambian Genome Variation Project samples on GRCh38

2021-09-30 00:00:00 +0100

We have recently published a Data Note describing our analysis of 505 samples from four Gambian populations in the Gambian Genome Variation Project (GGVP) on GRCh38.

For the analysis we have used a multi-caller site discovery approach along with imputation and phasing to produce a phased biallelic single nucleotide variant (SNV) and insertion/deletion (INDEL) call set. Variation had not previously been explored on the GRCh38 human genome assembly for 387 of the samples. Compared to our previous work with the 1000 Genomes Project data on GRCh38 described here, we identified over nine million novel SNVs and over 870 thousand novel INDELs.

The files generated in this analysis can be accessed from our FTP. Including the alignment files used in the variant identification http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/data/ and the call set itself http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/gambian_genome_variation_project/release/20200217_biallelic_SNV/

More information on the samples analysed in this work can be found in the IGSR portal.

Frequency distributions and genotypes are available in Ensembl.