九、功能性分析(生物学意义不能少,editor拒稿几大套路之一)

peak的功能属性不同于统计属性,生物学意义是生物信息学分析提档的关键

1、Relative gene location distribution of peaks

# Step1 - intersect with different part of genes
work_dir=/home4/sjshen/project/m6a/GSE29714_Cell
peak_distribution_dir=$work_dir/1.1_hisat2/mouse/position_sorted_bam/peak_distribution

cd $peak_distribution_dir/result

for i in $peak_distribution_dir/peak_bed_home/SYSY_rep1_summits.bed 
do 
t1=${i##*/}
bed_name_1=${t1%%.bed}
    for j in $peak_distribution_dir/ann_bed_home/*bed 
    do
    t2=${j##*/}
    bed_name_2=${t2%%.bed}
    bedtools intersect -wa -wb -a $i -b $j >  $peak_distribution_dir/result/${bed_name_1}.${bed_name_2}.intersect.tab &
    done
done

#--------------------------------------------------------------------

# Step2 - plot in R
# 详见代码

  《=== 绘图代码 R ===》 ###请打开 Rcode - part9_1

2、Which gene(s) does peak(s) belong to ? —— gene mapping

# 方法1 - homer - intersect with different part of genes
cd anno_peak

for i in /home4/sjshen/project/m6a/GSE29714_Cell/1.1_hisat2/mouse/position_sorted_bam/homer/bed_home/*.narrowPeak
do
echo $i
name=${i##*/}
mkdir ${name}
#mkdir ${name}/go
annotatePeaks.pl  $i mm10 -annStats ${name}/${name}.stat 2>${name}/${name}.log 1>${name}/${name}.gene &
done 

《=== Homer - 临近基因注释 ===》

# 方法2 - ChIPseeker
# 详见R代码
>

《=== Rcode - part9_2 - ChIPSeeker ===》

# 方法3 - bedtools
for i in *bed;do
echo $i
awk '{printf "%s\t%d\t%d\t%s\t%d\t%s\n",$1,($2+$3)/2,($2+$3)/2,$4,$5,"+"}' $i | bedtools closest -a - -b mm10_transcript_uniqGene.bed -D b > ${i/.bed/.myann.txt}
done
# 1.peak用中心坐标来找
# 2.两个文件均包含六列(第六列为strand--重要)
# 3.根据输出结果最后一列判断:在基因上游为负值,基因下游为正值, 基因body为0

3、Which functional region(s) does peak(s) belong to ?

# 方法1 - homer - 

《=== Homer - peak分布注释 ===》

# 方法2 - ChIPseeker

《=== Rcode - part9_3 - ChIPSeeker ===》

4、Chrom profile of peaks

《=== Rcode - part9_4 - ChIPSeeker ===》

5、GO term & KEGG

《=== Rcode - part9_5 - ChIPSeeker ===》

 

Dependencies

上一页
下一页