Oct 14, 2013

Denoising on the Cluster

Goals: Denoise 454 data on the HHMI cluster.

Methods: The following script was submitted:
denoise_wrapper.py -v -i GPWS.sff.txt,GSTY.sff.txt,GSAN.sff.txt,GSAI.sff.txt \
-f combined_seqs_100213.fna -m soil_master_metadata_072513.txt \
-o combined_denoising_output/ -n 96 --titanium


Results: the database was built successfully and the filtering step runs FlowgramAli_4fr 96 times. However, all 96 of these threads run on one node, insteas of three.


New Goals: fully use three nodes while denoising

Methods: cancel denoising, AFTER I confirm with the Qiime developers that I am correctly using the script to resume denoising.

To resume denoising, I should be able to run:
mpirun denoiser.py -v -i GPWS.sff.txt,GSTY.sff.txt,GSAN.sff.txt,GSAI.sff.txt \
-f combined_seqs_100213.fna -m soil_master_metadata_072513.txt \
-o combined_denoising_output_resumed/ -p combined_denoising_output/ --checkpoint_fp combined_denoising_output/checkpoints/checkpoint50.pickle \
-c -n 96 --titanium


Kyle suggested mpirun, to balance these threads between all nodes.

10 hours over 3 days

No comments:

Post a Comment