Sep 8, 2013

denoising done! OTU picking on our Cluster

Goals: pick OTUs on the HHMI Cluster

Denoising Methods:

454 sequences were denoised using the following script, which was called rerun.sh.
rm out/ -Rf
rm nohup.out

echo "Start time: $(data)"

denoise_wrapper.py -v -i GSTY.sff.txt \
-f GSTY_s20_seqs.fna \
-m GSTY_mapping.txt \
-o out/ -n 8 --titanium

echo "End time: $(data)"


This script was run with nohup ./rerun.sh &

On our Cluster

We remove completed files from EC2 instances, used cat to combine the sequences (.fna files) into combined_seqs.fna, and uploaded this to out cluster. The file 97_outs.fast from GreanGenes gg_13_5
OTUs with the following script:

pick_otus.py -i combined_denoised_seqs.fna -z -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -m uclust_ref --uclust_otu_id_prefix qiime_otu -o uclust_ref_gg12_

We then ran pick_rep_set.py -i uclust_ref_gg12_/combined_denoised_seqs_otus.txt -f combined_denoised_seqs.fna -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -o pick_rep_set

Then we ran parallel_assign_taxonomy_rdp.py -i pick_rep_set.fasta -o rdp_assigned_taxonomy/ -O 32

Results: OTUs were picked very quickly (15 minutes). A total of 12528 OTUs were found, 8638 of which were new.
Picking the rep set was also very fast.
Assigning taxonomy with RDP hangs on qiime 1.6.0. This is a known issue, which has been fixed in 1.7.0. We could get qiime 1.7.0 running on our cluster or use and EC2 instance.


4 hours with great happiness and sadness.

No comments:

Post a Comment