二、去除数据中的不良序列 (一把剪刀走江湖)

1、神器专精 cutadapt

work_dir=/home4/sjshen/project/m6a/GSE29714_Cell
fastq_dir=$work_dir/0.0_fastq
fastq_cut_dir=$work_dir/0.2_fastq_cut

echo $fastq_dir
cd $fastq_dir

#for Single-end data

for i in *.fastq.gz
do
echo $i

(nohup cutadapt -a AGATCGGAAGAGC -m 20  -q 20,20 --trim-n -o $fastq_cut_dir/${i%%.fastq.gz}_cut.fastq.gz $i > $fastq_cut_dir/${i%%.fastq.gz}.cutadpt.log) &

done

Tips1:在shell编程中,${i/ABC/abc} 和 ${i%%ABC}abc 是两种比较常用的重命名文件的形式 Tips2:在cutadapt中 -a/-A -m -q –trim-n 等是比较常用的参数

#for Pair-end data

for i in *_1.fq.gz
do
echo $i
t=${i/_1.fq.gz/_2.fq.gz}
echo $t

(nohup cutadapt -a AGATCGGAAGAGC -A AGATCGGAAGAGC --trim-n -m 50 -q 20,20 -o $fastq_cut_dir/${i%%.fq.gz}_cut.fastq.gz -p $fastq_cut_dir/${t%%.fq.gz}_cut.fastq.gz  $i $t > $fastq_cut_dir/${i%%_1.fq.gz}.cutadpt.log) &

done

 

2、multiqc总结一下

fastq_cut_dir=$work_dir/0.2_fastq_cut
cd $fastq_cut_dir
multiqc -d ./ -dd 5 -n 2.cutadapt_qc

  《=== cutadapt报告示例 ===》

 

3、看看洗干净了不

fastq_cut_dir=$work_dir/0.2_fastq_cut
cd $fastq_cut_dir
for i in *.fastq.gz
do
echo $i
(nohup fastqc -o $work_dir/0.3_fastqc_cut -f fastq ./$i > $work_dir/0.3_fastqc_cut/${i}.fastqc.log) &
done
#--------------------------------------------------------------------
fastqc_dir=$work_dir/0.1_fastqc
cd $fastqc_dir
multiqc -d ./ -dd 5 -n 3.fastqc_cut

  《=== fastqc_cut报告示例 ===》


   

Dependencies

上一页
下一页