八、motif(下游验证的突破口)

peak分析中,motif是重要的一环,这也是后续实验验证的(mutation)的核心,之所以特异,正因为有你——motif

1、Homer(不是那个写史诗的荷马,其实是辛普森一家)

# Step1 - filter (maybe)

awk 'BEGIN{OFS="\t"; FS="\t"} $5 > 50 {print $0}' Rrp6_ChIRP_1_peaks.narrowPeak > Rrp6_ChIRP_1_peaks_m100.narrowPeak 
awk 'BEGIN{OFS="\t"; FS="\t"} $5 > 50 {print $0}' Rrp6_ChIRP_2_peaks.narrowPeak > Rrp6_ChIRP_2_peaks_m100.narrowPeak 

#--------------------------------------------------------------------
# Step2 - clean

for i in /home4/sjshen/project/m6a/GSE29714_Cell/1.1_hisat2/mouse/position_sorted_bam/homer/bed_home/*.narrowPeak
do
echo $i
#awk 'BEGIN{OFS="\t";FS="\t";count=1} {print "peak_"count,$1,$2,$3,0,0;count=count+1}' $i > ${i/.bed/.new.bed}
#awk 'BEGIN{OFS="\t";FS="\t";count=1} {print $1,$2,$3,"peak_"count,$4,0;count=count+1}' $i > ${i/.narrowPeak/.new.bed}
awk 'BEGIN{OFS="\t";FS="\t";count=1} {print $1,$2,$3,$4}' $i > ${i/.narrowPeak/.new.bed}
done 

#--------------------------------------------------------------------
# Step3 - find motif

for i in ./*.new.bed
do
echo $i
t=${i##*/}
name=${t%%.new.bed}
mkdir ${name}
findMotifsGenome.pl $i mm10 ${name} -size given -mask -p 20 -len 5,6,8,10,12 -S 30 -rna -preparsedDir ${name} > ${name}/${name}.log 2>&1 &
done 

  《=== 输入bed文件示例 ===》 《=== 输出结果文件示例 ===》

 

2、参数小贴士

参数 解释
-size fragment size to use for motif finding
-mask mask repeats/lower case sequence
-len motif length
-S Number of motifs to optimize
-rna output RNA motif logos and compare to RNA motif databas
-gc use GC% for sequence content normalization
-preparsedDir location to search for preparsed file and/or place new files

 

3、其他方法(可再补充)

A - 基于fasta的homer
bedtools getfasta -fi ~/Database/	genome/hg38/hg38_UCSC.fa -bed Joint_peak.bed -fo Joint_peak.fa -split -s
findMotifs.pl jointPeak.fa fasta /home/xxx/project1/fisherPeak/homer_motif -fasta ~/Database/transcriptome/backgroud_peaks/hg38_200bp_randomPeak.fa -rna -p 10 -len 5,6,7

B - MEME(不是那个meme)
C - 基于bam序列
D - 基于已知motif矩阵

Dependencies

上一页
下一页