Pruning Process for GWAS Summary Statistics File in scPagwas
The GWAS summary statistics file needs to be provided by users. However, in many cases, the number of SNPs in such files can reach ten millions, and this can significantly impact computing time. Therefore, it is recommended to preprocess the file using Plink software’s prune function before inputting it. The prune function is a commonly used tool in Plink software that selects representative SNPs between LD loci, reduces the number of highly similar loci, and lowers the false-positive rate, thereby improving result reliability. Furthermore, this process does not significantly impact the final result. Hence, it is advisable to use this method if the GWAS file is too large.
Reprocess the gwas data
To calculate plink, important information from the GWAS data needs to be extracted.
GWAS_raw <-read_table("./monocytecount.gz")
GWAS_raw<-GWAS_raw[,c(1,2,3,4,5,6,9,10)]
colnames(GWAS_raw)<-c("chrom","pos","REF","ALT","rsid","nearest_genes","beta","se")
write.table(GWAS_raw,file="./monocytecount.txt",row.names=F,quote=F)
Extract SNPs
mkdir tempfile
awk '{print $5 }' monocytecount.txt > ./tempfile/monocytecount_SNP_list.txt
plink
for i in $(seq 1 22)
do
echo $i
plink
--bfile ./02_partitioned_LD_score_estimation/1000G_EUR_Phase3_plink/1000G.EUR.QC.$i
--extract ./tempfile/monocytecount_SNP_list.txt
--noweb --make-bed --out ./tempfile/1000G.EUR.QC.monocytecount_${i}_filtered
done
filter LD information
for i in $(seq 1 22)
do
echo $i
plink
--bfile ./tempfile/1000G.EUR.QC.monocytecount_${i}_filtered
--indep-pairwise 50 5 0.8
--out ./tempfile/monocytecount_${i}_plink_prune_EUR_filtered_LD0.8
done
Integrate the result files for 1-22 chrome
cat [monocytecount]*.prune.in > monocytecount_EUR_LD0.8.prune
Intersect the snp for all gwas file
library(readr)
library(dplyr)
gwas<-read_table("./monocytecount.txt")
SNP_prune<- read_table("./tempfile/AD_EUR_LD0.8.prune")
SNP_prune<-SNP_prune[!duplicated(unlist(SNP_prune)),]
colnames(SNP_prune)<-"rsid"
#### Left Join using inner_join function
gwas= gwas %>% inner_join(SNP_prune,by="rsid")
print(nrow(gwas))
write.table(gwas,file="./monocytecount_prune_gwas_data.txt",row.names=F,quote=F)