八、motif(下游验证的突破口)
peak分析中,motif是重要的一环,这也是后续实验验证的(mutation)的核心,之所以特异,正因为有你——motif
1、Homer(不是那个写史诗的荷马,其实是辛普森一家)
# Step1 - filter (maybe)
awk 'BEGIN{OFS="\t"; FS="\t"} $5 > 50 {print $0}' Rrp6_ChIRP_1_peaks.narrowPeak > Rrp6_ChIRP_1_peaks_m100.narrowPeak
awk 'BEGIN{OFS="\t"; FS="\t"} $5 > 50 {print $0}' Rrp6_ChIRP_2_peaks.narrowPeak > Rrp6_ChIRP_2_peaks_m100.narrowPeak
#--------------------------------------------------------------------
# Step2 - clean
for i in /home4/sjshen/project/m6a/GSE29714_Cell/1.1_hisat2/mouse/position_sorted_bam/homer/bed_home/*.narrowPeak
do
echo $i
#awk 'BEGIN{OFS="\t";FS="\t";count=1} {print "peak_"count,$1,$2,$3,0,0;count=count+1}' $i > ${i/.bed/.new.bed}
#awk 'BEGIN{OFS="\t";FS="\t";count=1} {print $1,$2,$3,"peak_"count,$4,0;count=count+1}' $i > ${i/.narrowPeak/.new.bed}
awk 'BEGIN{OFS="\t";FS="\t";count=1} {print $1,$2,$3,$4}' $i > ${i/.narrowPeak/.new.bed}
done
#--------------------------------------------------------------------
# Step3 - find motif
for i in ./*.new.bed
do
echo $i
t=${i##*/}
name=${t%%.new.bed}
mkdir ${name}
findMotifsGenome.pl $i mm10 ${name} -size given -mask -p 20 -len 5,6,8,10,12 -S 30 -rna -preparsedDir ${name} > ${name}/${name}.log 2>&1 &
done
《=== 输入bed文件示例 ===》 《=== 输出结果文件示例 ===》
2、参数小贴士
参数 | 解释 | |||||
---|---|---|---|---|---|---|
-size | fragment size to use for motif finding | |||||
-mask | mask repeats/lower case sequence | |||||
-len | motif length | |||||
-S | Number of motifs to optimize | |||||
-rna | output RNA motif logos and compare to RNA motif databas | |||||
-gc | use GC% for sequence content normalization | |||||
-preparsedDir | location to search for preparsed file and/or place new files |
3、其他方法(可再补充)
A - 基于fasta的homer
bedtools getfasta -fi ~/Database/ genome/hg38/hg38_UCSC.fa -bed Joint_peak.bed -fo Joint_peak.fa -split -s
findMotifs.pl jointPeak.fa fasta /home/xxx/project1/fisherPeak/homer_motif -fasta ~/Database/transcriptome/backgroud_peaks/hg38_200bp_randomPeak.fa -rna -p 10 -len 5,6,7
B - MEME(不是那个meme)
C - 基于bam序列
D - 基于已知motif矩阵