九、功能性分析(生物学意义不能少,editor拒稿几大套路之一)
peak的功能属性不同于统计属性,生物学意义是生物信息学分析提档的关键
1、Relative gene location distribution of peaks
# Step1 - intersect with different part of genes
work_dir=/home4/sjshen/project/m6a/GSE29714_Cell
peak_distribution_dir=$work_dir/1.1_hisat2/mouse/position_sorted_bam/peak_distribution
cd $peak_distribution_dir/result
for i in $peak_distribution_dir/peak_bed_home/SYSY_rep1_summits.bed
do
t1=${i##*/}
bed_name_1=${t1%%.bed}
for j in $peak_distribution_dir/ann_bed_home/*bed
do
t2=${j##*/}
bed_name_2=${t2%%.bed}
bedtools intersect -wa -wb -a $i -b $j > $peak_distribution_dir/result/${bed_name_1}.${bed_name_2}.intersect.tab &
done
done
#--------------------------------------------------------------------
# Step2 - plot in R
# 详见代码
《=== 绘图代码 R ===》
###请打开 Rcode - part9_1
2、Which gene(s) does peak(s) belong to ? —— gene mapping
# 方法1 - homer - intersect with different part of genes
cd anno_peak
for i in /home4/sjshen/project/m6a/GSE29714_Cell/1.1_hisat2/mouse/position_sorted_bam/homer/bed_home/*.narrowPeak
do
echo $i
name=${i##*/}
mkdir ${name}
#mkdir ${name}/go
annotatePeaks.pl $i mm10 -annStats ${name}/${name}.stat 2>${name}/${name}.log 1>${name}/${name}.gene &
done
# 方法2 - ChIPseeker
# 详见R代码
>
《=== Rcode - part9_2 - ChIPSeeker ===》
# 方法3 - bedtools
for i in *bed;do
echo $i
awk '{printf "%s\t%d\t%d\t%s\t%d\t%s\n",$1,($2+$3)/2,($2+$3)/2,$4,$5,"+"}' $i | bedtools closest -a - -b mm10_transcript_uniqGene.bed -D b > ${i/.bed/.myann.txt}
done
# 1.peak用中心坐标来找
# 2.两个文件均包含六列(第六列为strand--重要)
# 3.根据输出结果最后一列判断:在基因上游为负值,基因下游为正值, 基因body为0
3、Which functional region(s) does peak(s) belong to ?
# 方法1 - homer -
# 方法2 - ChIPseeker
《=== Rcode - part9_3 - ChIPSeeker ===》
4、Chrom profile of peaks
《=== Rcode - part9_4 - ChIPSeeker ===》
5、GO term & KEGG
《=== Rcode - part9_5 - ChIPSeeker ===》
Dependencies
- Softwares