Sep 21, 2013

VM image, qiime 1.7.0, and wash bottles

Miscellaneous day in lab.

The VM image of qiime 1.6.0 which is provided by the Knight Lab was fully updated along the lines of my previous guide. This should reduce the configuration time needed for other students in lab.

After the cluster was backed up, I deployed qiime 1.7.0 to /share/apps/qiime-1.7.0/. After some of the manual fixes (like manual addition of pplacer), I made a new module called qiime-1.7.0 based on the new activate.sh. Running module load qiime-1.7.0 then print_qiime_config.py -t confirms that the newer qiime is functioning on the cluster. We need to make sure everything is stable by restarting our cluster, then we should have all the qiimes we need.

I refilled the wash bottles used for sterilization. We primarily use 70% denatured ethanol and 10% bleach solutions (both are percents by volume).

3-4 hours

Sep 12, 2013

we have an OTU table!

Goal: get an OTU table using denoised data

Methods:
On our cluster, with qiime 1.6.0:
pick_otus.py -i combined_denoised_seqs.fna -z -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -m uclust_ref --uclust_otu_id_prefix qiime_otu -o uclust_ref_gg12_

then
pick_rep_set.py -i uclust_ref_gg12_/combined_denoised_seqs_otus.txt -f combined_denoised_seqs.fna -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -o pick_rep_set

On EC2 running qiime 1.7.0:
parallel_assign_taxonomy_rdp.py -i /home/ubuntu/data/soil/pick_rep_set.fasta -O 8 --rdp_max_memory 4000 -o /home/ubuntu/data/soil/tax_assign_out2

Back on our cluster with qiime 1.6.0:
make_otu_table.py -i combined_denoised_seqs_otus.txt -t pick_rep_set_tax_assignments.txt -o soil_otu_table.biom

Result: We have an OTU table called soil_otu_table.biom! More info about it:
Num samples: 61
Num otus: 12528
Num observations (sequences): 646884.0
Table density (fraction of non-zero values): 0.1284

Seqs/sample summary:
Min: 3279.0
Max: 33718.0
Median: 9823.0
Mean: 10604.6557377
Std. dev.: 5310.3842468
Median Absolute Deviation: 3709.0
Default even sampling depth in
core_qiime_analyses.py (just a suggestion): 3279.0
Sample Metadata Categories: None provided
Observation Metadata Categories: taxonomy


Sep 10, 2013

assign_taxonomy on EC2

Goal: using qiime 1.7.0 on EC2 to assign taxonomy to soil OTUs.

Methods: This script was used by Ryan for the fracking project, and we used it again.

The file run_soil.sh:
#!/bin/bash

nohup echo "start time: $(date)"

nohup time \
parallel_assign_taxonomy_rdp.py \
-i /home/ubuntu/data/soil/pick_rep_set.fasta \
-O 8 \
--rdp_max_memory 4000 \
-o /home/ubuntu/data/soil/tax_assign_out/

nohup echo "end time: $(date)"


We then ran ./run_soil.sh & to use this script.

1.5 hours

Sep 8, 2013

denoising done! OTU picking on our Cluster

Goals: pick OTUs on the HHMI Cluster

Denoising Methods:

454 sequences were denoised using the following script, which was called rerun.sh.
rm out/ -Rf
rm nohup.out

echo "Start time: $(data)"

denoise_wrapper.py -v -i GSTY.sff.txt \
-f GSTY_s20_seqs.fna \
-m GSTY_mapping.txt \
-o out/ -n 8 --titanium

echo "End time: $(data)"


This script was run with nohup ./rerun.sh &

On our Cluster

We remove completed files from EC2 instances, used cat to combine the sequences (.fna files) into combined_seqs.fna, and uploaded this to out cluster. The file 97_outs.fast from GreanGenes gg_13_5
OTUs with the following script:

pick_otus.py -i combined_denoised_seqs.fna -z -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -m uclust_ref --uclust_otu_id_prefix qiime_otu -o uclust_ref_gg12_

We then ran pick_rep_set.py -i uclust_ref_gg12_/combined_denoised_seqs_otus.txt -f combined_denoised_seqs.fna -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -o pick_rep_set

Then we ran parallel_assign_taxonomy_rdp.py -i pick_rep_set.fasta -o rdp_assigned_taxonomy/ -O 32

Results: OTUs were picked very quickly (15 minutes). A total of 12528 OTUs were found, 8638 of which were new.
Picking the rep set was also very fast.
Assigning taxonomy with RDP hangs on qiime 1.6.0. This is a known issue, which has been fixed in 1.7.0. We could get qiime 1.7.0 running on our cluster or use and EC2 instance.


4 hours with great happiness and sadness.

Sep 6, 2013

Informatic tools in our lab

Remote Machines

On your computer, search for and then launch Remote Desktop.
Enter the server name or IP address. Some of our servers and their IPs are listed below
  • Basement lab PC: 10.39.4.1
  • VLCS 1062 PC: 
  • GCAT-SEEK server: gcatseek
After logging, check if the server is being used. (Open Task Manager and check CPU and RAM.) Your data and VM must already be on the machine or your hard drive must be attached.

Your Virtual Machine running QIIME

Get one of the installation DVDs from Dr. L and follow the instructions in the readme file. You can also follow the official documentation.
Before starting the VM, check the resource load on the system and adjust your settings accordingly. You are now technically ready to use Qiime, but I recommend these additional adjustments.

The HHMI Cluster

Send an email to Dr. Lamendella or Colin Brislawn. Or speak with us in lab.
If you already have access, you can use ssh to connect. The IP address is 10.39.6.10

Setting up the Qiime VM

Goal: Get Qiime running the the VirtualBox and fine-tune its settings.

Methods: The Qiime pipeline is distributed as a Virtual Machine (VM). This way, the complex pipeline 'just works.' At least that's the hope. These are the steps I took while setting up Qiime 1.7.0.

Installing the Qiime Virtual Box is very well documented. I followed those steps. Because I'm on Windows, I used 7-zip to open the compressed .gz file. Everything else is the same.

After following these instructions, I opened the Virtual Machine (VM) to make sure it was working. I also did the following things.
  1. Installed the VirtualBox Guest Additions using the disk icon on the left side of the screen.
  2. Installed the Synaptic Package Manager using the Ubuntu Software Center.*
  3. Installed the package 'nautilus-open-terminal'*
  4. Installed these packages: ipython, ipython-notebook-common, and ipython-notebook
  5. Did updates (all of them, I think... In Qiime 1.6.0, some updates caused problems. I don't remember any problems in Qiime 1.7.0)*
  6. Changed some Settings: (first, I shut down the VM)
    1. Gave my VM access to more memory and processor cores. (in settings>system)
    2. Made Shared Clipboard and Drag'n'Drop bidirectional (settings>general>advanced)*
    3. Connected the Shared_Folder on my virtual desktop, to a folder in my real computer. (settings>Shared Folders>Add Shared Folder)
  7. Pinned Synaptic Package Manager, System Monitor, and Files to my Dashboard.*
  8. Opened Terminal and ran the script 'print_qiime_config.py' (And it worked!)
*Yeah, you don't really need to do these steps. I find these programs useful and convenient, but they are not strictly needed.

4 hours

Sep 2, 2013

Get files from a finished EC2 Instance

Objective: Download our files from an EC2 instance on which denoising has finished.

Method: log into AWS and go to your instances.
Start the instance: Right-click > Start. (For long runs, we usually set an alarm to shutdown the instance when CPU use dropped to zero. So we have to start it back up again to download our data.)
Remove any alarms which may Stop your instance. In the column titled 'Alarm Status,' click on the link then the name of the alarm, then make sure 'Take this action' is unchecked.
Connect to the instance with ssh: Right-click > Connect (You can also use Terminal on Mac or Putty on Windows) Change user name to 'ubuntu' and select the path to the .qem certificate.
Remount Volumes. (If you need, you can check which volumes are attached.) Run sudo mount /dev/xvdf/ $HOME/data/. Then check that the data folder contains the files you need.
Connect to the instance with Cyberduck.
Download ALL the files!

You may consider compressing your files to save download time. In the directory, pick a file and type gzip YourFileName.fna/.
You can compress an entire folder with tar -czvf YourFolderName/.
Compression is particularly good with large repetitive files, so it's perfect for sequence data.

Results: The denoiser.log, centroids.fasta, singletons.fasta, denoiser_mapping.txt, denoised_clusters.txt, and denoised_seqs.fasta were downloaded from the two finished Instances.

3 hours over 5 hours