Denoising Methods:
454 sequences were denoised using the following script, which was calledrerun.sh
.rm out/ -Rf
rm nohup.out
echo "Start time: $(data)"
denoise_wrapper.py -v -i GSTY.sff.txt \
-f GSTY_s20_seqs.fna \
-m GSTY_mapping.txt \
-o out/ -n 8 --titanium
echo "End time: $(data)"
This script was run with
nohup ./rerun.sh &
On our Cluster
We remove completed files from EC2 instances, usedcat
to combine the sequences (.fna files) into combined_seqs.fna, and uploaded this to out cluster. The file 97_outs.fast
from GreanGenes gg_13_5
OTUs with the following script:
pick_otus.py -i combined_denoised_seqs.fna -z -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -m uclust_ref --uclust_otu_id_prefix qiime_otu -o uclust_ref_gg12_
We then ran
pick_rep_set.py -i uclust_ref_gg12_/combined_denoised_seqs_otus.txt -f combined_denoised_seqs.fna -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -o pick_rep_set
Then we ran
parallel_assign_taxonomy_rdp.py -i pick_rep_set.fasta -o rdp_assigned_taxonomy/ -O 32
Results: OTUs were picked very quickly (15 minutes). A total of 12528 OTUs were found, 8638 of which were new.
Picking the rep set was also very fast.
Assigning taxonomy with RDP hangs on qiime 1.6.0. This is a known issue, which has been fixed in 1.7.0. We could get qiime 1.7.0 running on our cluster or use and EC2 instance.
4 hours with great happiness and sadness.
No comments:
Post a Comment