Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Genotyping and loci filtering

After sequencing, all fastq information had been high quality checked utilizing FastQC65 earlier than being mapped to the Coregonus sp. “Balchen” Alpine whitefish reference genome (ENA accession: GCA_902810595.1; ref. 47; with extra un-scaffolded contigs (https://datadryad.org/stash/dataset/doi:10.5061/dryad.xd2547ddf;66) to make sure correct mapping) utilizing bwa-mem v.0.7.1767 altering the ‘r’ setting to 1 to permit extra correct, albeit extra time-consuming, alignment. Mosdepth v.0.2.868 was used to calculate imply sequencing protection from the BAM information for every of the 97 people which ranged from 15.32x to 41.69x (a further two people had been added to this dataset after genotype calling mentioned beneath). Picard-tools (Model 2.20.2; http://broadinstitute.github.io/picard/) was then used to mark duplicate reads (MarkDuplicates), repair mate info, (FixMateInformation) and change learn teams (AddOrReplaceReadGroups). Genotypes had been then referred to as throughout the 40 chromosome-scale scaffolds included within the Coregonus sp. “Balchen” Alpine whitefish meeting (ENA accession: GCA_902810595.1; ref. 47) utilizing HaplotypeCaller in GATK v.4.0.8.169 utilizing a minimal mapping high quality filter of 30. The ensuing VCF file was then filtered utilizing vcftools v.0.1.1470 to take away indels (–remove-indels) and embody biallelic loci (–min-alleles 2 –max-alleles 2) which have a minor allele depend > 3 (–mac 3), no lacking information (–max-missing 1), a minimal depth > 3 (–min-meanDP 3 –minDP 3), a most depth < 50 (–max-meanDP 50 –maxDP 50), and a minimal high quality of 30 (–minQ 30), to go away 16,926,710 SNPs. Loci that fell inside doubtlessly collapsed areas of the genome meeting (as recognized in47) had been eliminated utilizing BEDTools v.2.28.0 (ref. 71; bedtools subtract) and any loci with duplicate IDs which had been recognized with PLINK v.1.9072 had been eliminated with VCFtools70 leading to 15,841,979 SNPs. To extend our sampling of the species C. macrophthalmus from Lake Constance from one particular person to a few, we added sequencing information from a further two people (beforehand sequenced by Frei et al.73; Supplementary Knowledge 1). To keep away from the downstream impacts of mixing sequencing information from completely different runs (which may outcome from completely different biased nucleotide calls and introduce faulty alerts of genetic differentiation; as outlined in ref. 74), we mapped these two samples as above (leading to a imply genome-wide protection of 9.32× and 16.58×) and referred to as genotypes once more for all samples (together with the 2 extra C. macrophthalmus people) at every of the unique 15,841,979 SNP positions. Following this genotype calling, which resulted in 15,521,925 SNPs, SNP filtering was repeated as earlier than, leaving 14,313,952 SNPs with no lacking information throughout the dataset of 99 people.

Comments

0 comments

Leave a comment

Your email address will not be published. Required fields are marked *