R语言画全基因组关联分析中的曼哈顿图（manhattan plot）

1、在linux中安装好R

2、准备好画曼哈顿图的R脚本即manhattan.r，manhattan.r内容如下：

#!/usr/bin/Rscript
#example : Rscript plot_manhatom.r XXX.assoc XXX.pdf
argv <- commandArgs()
#define the function to plot the manhatton and quantitle-quantitle plot
plot_manhatton<-function(site_file,save_file){
        read.table(site_file,header=T)->data
    data<-data[which(data[,5]=="ADD"),]
    logp=-log10(data[,9]) #value
        chr=data[,1] #chr
        position<-data[,3] #position
#       xaxis=0
        sig=numeric()
        lab=numeric()
    flogp=numeric();
        the_col=c("darkblue","gray"); #chr class
        whole_col=c()
    xlabel=numeric();
        length_add=0
        label=numeric();
        for(i in 1:22){
                position[chr==i]->chr_pos
        chr_pos<-chr_pos-chr_pos[1]+17000000
        chr_num=length(chr_pos)
                cat("For chrosome",i,",Num of SNPs:",chr_num,"\n")
                if(length(chr_pos)==0){
                        next
                }
                flogp=c(flogp,logp[chr==i])
                label=c(label,i)
                whole_col=c(whole_col,rep(the_col[i%%2+1],chr_num))
                chr_pos=length_add+chr_pos
                xlabel=c(xlabel,chr_pos)
                length_add=sort(chr_pos,decreasing=T)[1]
        lab=c(lab,(chr_pos[1]+length_add)/2)
        }
    png(save_file,height=500,width=600)
    par(mar=c(5,6,4,2))
        plot(xlabel,flogp,col=whole_col,axes=F,xlab="",ylab="",ylim=c(0,12),pch=20,cex=0.5,cex.lab=1.2,cex.axis=1.4)#ylim=c(0,6)
    significance=-log10(0.05/length(data[,1]))
    sig_pos<-xlabel[which(flogp>significance)]
    for(i in 1:length(sig_pos)){
        sig<-c(sig,which(xlabel>sig_pos[i]-60000 & xlabel<sig_pos[i]+60000))
    }
    sig<-unique(sig)
    cat("significant signal:",sig[10:20],"\n")
    points(xlabel[sig],flogp[sig],col="red",pch=20,cex=0.5)
    title(xlab="Chromosome",ylab=expression(-log[10]*(p)),cex.lab=1.4)
#   title(xlab="Chromosome",ylab="Score",cex.lab=1.8)
#       title("GWAS scan for Hair curl", cex.main=2.5)
#       yaxis=seq(0,1,by=0.1)
#       axis(2,yaxis,yaxis,las=1,cex.axis=1.5,line=-2)
        axis(2,las=2,cex.axis=1.4)
        #las can change the directory of the axis character
        #las=0 always parallel to the axis
        #las=2 always horizontal

        for(i in 1:22){
                mtext(label[i],side=1,at=lab[i],las=0,font=1,cex=0.8)
                #cex magnified relative to the to the default
        }
#    mtext("X",side=1,at=x[23],las=0,font=1,cex=1.4)
#    mtext("Y",side=1,at=x[24],las=0,font=1,cex=1.4)

#       axis(1,x,as.character(label),tick=TRUE,font=1)
        par(lty=2)
        abline(h=significance,cex=1.5,col="red")
#plot the QQ plot
#    par(fig=c(0.58,0.92,0.43,0.95),new=TRUE)
#    observed <- sort(data[,12])
#    logp=-log10(observed)
#    expected <- c(1:length(observed))
#    lexp <- -(log10(expected / (length(expected)+1)))
#    plot(x=lexp,y=logp,pch=19,cex=0.6, xlab="Expected (-logP)", ylab="Observed (-logP)",cex.lab=1.2,cex.axis=1.2)
#    abline(0,1,col="red",lwd=2,lty=1)

    dev.off()
}

site_file<-argv[6];print(site_file)
save_file<-argv[7];print(save_file)
plot_manhatton(site_file,save_file)

#round(x,digits=3) keep the length of the digit

3、准备好plink跑完后的.assoc.linear文件，比如mydata.assoc.linear

4、在linux中输入以下一个命令：

其中，mydata.png即为我们想要的曼哈顿图（manhattan plot）

Rscript manhattan.r mydata.assoc.linear mydata.png

时间： 2024-10-14 06:39:14

R语言画全基因组关联分析中的曼哈顿图（manhattan plot）的相关文章

全基因组关联分析（Genome-Wide Association Study，GWAS）流程

全基因组关联分析流程: 一.准备plink文件 1.准备PED文件 PED文件有六列,六列内容如下: Family ID Individual ID Paternal ID Maternal ID Sex (1=male; 2=female; other=unknown) Phenotype PED文件是空格(空格或制表符)分隔的文件. PED文件长这个样: 2.准备MAP文件 MAP文件有四列,四列内容如下: chromosome (1-22, X, Y or 0 if unplaced) r

一行命令学会全基因组关联分析(GWAS)的meta分析

为什么需要做meta分析群体分层是GWAS研究中一个比较常见的假阳性来源. 也就是说,如果数据存在群体分层,却不加以控制,那么很容易得到一堆假阳性位点. 当群体出现分层时,常规手段就是将分层的群体独立分析,最后再做meta分析. 1.如何判断群体是否分层先用plink计算PCA,具体方法详见链接:GWAS群体分层 (Population stratification):利用plink对基因型进行PCA 随后画出PC1和PC2在不同群体的散点图,观察群体之间是否明显分开,如果明显分开,说明群体

关联分析中寻找频繁项集的FP-growth方法

关联分析是数据挖掘中常用的分析方法.一个常见的需求比如说寻找出经常一起出现的项目集合. 引入一个定义,项集的支持度(support),是指所有包含这个项集的集合在所有数据集中出现的比例. 规定一个最小支持度,那么不小于这个最小支持度的项集称为频繁项集(frequent item set). 如何找到数据集中所有的频繁项集呢? 最简单的方法是对所有项集进行统计,可以通过逐渐增大项集大小的方式来遍历所有项集.比如说下面的数据集,先统计所有单个元素集合的支持度,{z} 的支持度为5 (这里把项目出现次

R语言画点状误差线

现在项目需要R语言做几个线性拟合,画一些点图,突然需要画误差线,网上找了下,可以用代码实现..效果如下 xx1<-c(xxxxxx,xxxx,xxxxx) yy1<-c(xxxxxx,xxxx,xxxxx) std1<-c(xxxxxx,xxxx,xxxxx) std2<-c(xxxxxx,xxxx,xxxxx) plot_stdy <- function(x, y, sd, len = 1, col = "black") { len <- len

R语言画曲线图

本文以1950年到2010年期间我国的火灾统计数据为例,数据如下所示: (0)加载数据 data<-read.csv("E:\\MyDocument\\p\\Data\\1950~2010火灾情况.csv") x=t(data[1]) y=t(data[2]) z=t(data[3]) w=t(data[4]) maxy=max(y) maxz=max(z) maxw=max(w) (1)将火灾数.直接损失.死伤人数,分别按年份作图 plot(x,y,type="o&q

用R语言画条形图（基于ggplot2包）

1.用qplot(x,data=data,geom.=”bar”,weight=y)+scale_y_continuous("y")画出y关于x的条形. 图中提示binwidth这里是指矩形的宽度,指定之后如下 qplot(x,data=data,geom="bar",weight=y,binwidth=0.2)+scale_y_continuous("y") 2.用qplot(x,data=data,geom.=”bar”)画出来的是频率直方图

R语言对苏州天气的分析及预测天气篇

坐标苏州,来这边刚好一年的时间,又到四月,梅雨季节(?)最能感受到烟雨江南的朦胧美,才怪!实际上的心情是,"清明时节雨纷纷,放假宅家欲断魂",已经无力吐槽这春夏交战冬天突围的诡异天气变化了,正好有时间,所以想用高大上的技术语言来解读一下苏州的天气特点. 历史天气数据来源:http://tianqi.2345.com/wea_history/54511.htm,这是北京的历史数据,采样城市北京.上海.苏州.长沙.广州.一共采集了2011-1-1到2015-4-2这四年三个月共1542(3

R语言做地图上的分析

R和ggplot可视化功能非常强大,了解了一下其中的地图做法,发现R做世界地图.美国地图非常容易,但做中国地图就太麻烦了,需要自己DIY. DIY也有多种方式,但网络上各种帖子教程的出图效果都不太理想,达不到工作用要求.下面是我的摸索过程,记录如下备忘,也请教于R老师们. 参考书目:ggplot2,R graphics cookbook,参考贴:http://site.douban.com/182577/widget/notes/10568279/note/257898418/ 0.引子 R里有

R语言对苏州天气的分析及预测温度篇

温度篇前面已经讲了苏州的天气特点,还是用相同的数据,做接下来的苏州气温特点的分析预测,是的预测在这里! 首先看下2011年到2015年苏州整体的温度表现是什么样的. plot(suzhou$highestTemp,type="l",col="red",main="苏州2011-2015年气温图",xlab="时间轴",ylab="温度℃") lines(suzhou$lowestTemp,type