Goal: given two lists of sequences, find the things that appear in both lists.
Methods: primer prospector filtered a list of potential primers down to the ones least likely to form secondary structure. Based on this list of 217 'good' primers, we needed to make a list of which wells those primers are in. So how do we compare these two lists?
In Excel 2013, the list of 'good' primers was added of the column containing all primers.
This column was selected then
home > conditional formatting > highlight cell rules > duplicate values
all columns with data were selected then
data > filter
In the column with barcodes, the drop-down box in the top cell was clicked then
filter > by color > cell color > (choose color from)
Results: This produces a list of barcodes which are identified as 'good' by primer prospector with their identifying plate number and position for easy access.
See also: Find duplicate values fast by applying conditional formatting
1 hour
Research with Lamendella Labs
"If it worked the first time, it wouldn't be called research."
Oct 21, 2013
Oct 14, 2013
Denoising on the Cluster
Goals: Denoise 454 data on the HHMI cluster.
Methods: The following script was submitted:
Results: the database was built successfully and the filtering step runs
New Goals: fully use three nodes while denoising
Methods: cancel denoising, AFTER I confirm with the Qiime developers that I am correctly using the script to resume denoising.
To resume denoising, I should be able to run:
Kyle suggested
10 hours over 3 days
Methods: The following script was submitted:
denoise_wrapper.py -v -i GPWS.sff.txt,GSTY.sff.txt,GSAN.sff.txt,GSAI.sff.txt \
-f combined_seqs_100213.fna -m soil_master_metadata_072513.txt \
-o combined_denoising_output/ -n 96 --titanium
Results: the database was built successfully and the filtering step runs
FlowgramAli_4fr
96 times. However, all 96 of these threads run on one node, insteas of three.New Goals: fully use three nodes while denoising
Methods: cancel denoising, AFTER I confirm with the Qiime developers that I am correctly using the script to resume denoising.
To resume denoising, I should be able to run:
mpirun denoiser.py -v -i GPWS.sff.txt,GSTY.sff.txt,GSAN.sff.txt,GSAI.sff.txt \
-f combined_seqs_100213.fna -m soil_master_metadata_072513.txt \
-o combined_denoising_output_resumed/ -p combined_denoising_output/ --checkpoint_fp combined_denoising_output/checkpoints/checkpoint50.pickle \
-c -n 96 --titanium
Kyle suggested
mpirun
, to balance these threads between all nodes.10 hours over 3 days
Oct 11, 2013
Additional sample collection and DNA extraction
Goals: take additional samples from Otto's and from commercial vendors.
Methods: our group had lunch at Otto's while Chris took five additional samples from completed beer. After, we purchased two of these beers in bottles from a local store. Back in lab, up to 120 ml of all samples were filtered through Sterivex millipore filters. This makes for 7 samples today and 11 samples overall.
Goals: extract DNA
Methods: the Mo Bio kit was used to extract DNA from the filters after filters were sliced into thin strips. A Qubit® 2.0 Fluorometer was used to measure DNA concentrations in extracted samples.
Results: Of the 11 samples, only 3 produced delectable quantities of DNA (>5 ng/ml).
Next: get all possible DNA from existing samples
15 hours over two days
Methods: our group had lunch at Otto's while Chris took five additional samples from completed beer. After, we purchased two of these beers in bottles from a local store. Back in lab, up to 120 ml of all samples were filtered through Sterivex millipore filters. This makes for 7 samples today and 11 samples overall.
Goals: extract DNA
Methods: the Mo Bio kit was used to extract DNA from the filters after filters were sliced into thin strips. A Qubit® 2.0 Fluorometer was used to measure DNA concentrations in extracted samples.
Results: Of the 11 samples, only 3 produced delectable quantities of DNA (>5 ng/ml).
Next: get all possible DNA from existing samples
15 hours over two days
Sep 21, 2013
VM image, qiime 1.7.0, and wash bottles
Miscellaneous day in lab.
The VM image of qiime 1.6.0 which is provided by the Knight Lab was fully updated along the lines of my previous guide. This should reduce the configuration time needed for other students in lab.
After the cluster was backed up, I deployed qiime 1.7.0 to /share/apps/qiime-1.7.0/. After some of the manual fixes (like manual addition of pplacer), I made a new module called qiime-1.7.0 based on the new activate.sh. Running
I refilled the wash bottles used for sterilization. We primarily use 70% denatured ethanol and 10% bleach solutions (both are percents by volume).
3-4 hours
The VM image of qiime 1.6.0 which is provided by the Knight Lab was fully updated along the lines of my previous guide. This should reduce the configuration time needed for other students in lab.
After the cluster was backed up, I deployed qiime 1.7.0 to /share/apps/qiime-1.7.0/. After some of the manual fixes (like manual addition of pplacer), I made a new module called qiime-1.7.0 based on the new activate.sh. Running
module load qiime-1.7.0
then print_qiime_config.py -t
confirms that the newer qiime is functioning on the cluster. We need to make sure everything is stable by restarting our cluster, then we should have all the qiimes we need.I refilled the wash bottles used for sterilization. We primarily use 70% denatured ethanol and 10% bleach solutions (both are percents by volume).
3-4 hours
Sep 12, 2013
we have an OTU table!
Goal: get an OTU table using denoised data
Methods:
On our cluster, with qiime 1.6.0:
then
On EC2 running qiime 1.7.0:
Back on our cluster with qiime 1.6.0:
Result: We have an OTU table called
Methods:
On our cluster, with qiime 1.6.0:
pick_otus.py -i combined_denoised_seqs.fna -z -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -m uclust_ref --uclust_otu_id_prefix qiime_otu -o uclust_ref_gg12_
then
pick_rep_set.py -i uclust_ref_gg12_/combined_denoised_seqs_otus.txt -f combined_denoised_seqs.fna -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -o pick_rep_set
On EC2 running qiime 1.7.0:
parallel_assign_taxonomy_rdp.py -i /home/ubuntu/data/soil/pick_rep_set.fasta -O 8 --rdp_max_memory 4000 -o /home/ubuntu/data/soil/tax_assign_out2
Back on our cluster with qiime 1.6.0:
make_otu_table.py -i combined_denoised_seqs_otus.txt -t pick_rep_set_tax_assignments.txt -o soil_otu_table.biom
Result: We have an OTU table called
soil_otu_table.biom
! More info about it:Num samples: 61
Num otus: 12528
Num observations (sequences): 646884.0
Table density (fraction of non-zero values): 0.1284
Seqs/sample summary:
Min: 3279.0
Max: 33718.0
Median: 9823.0
Mean: 10604.6557377
Std. dev.: 5310.3842468
Median Absolute Deviation: 3709.0
Default even sampling depth in
core_qiime_analyses.py (just a suggestion): 3279.0
Sample Metadata Categories: None provided
Observation Metadata Categories: taxonomy
Sep 10, 2013
assign_taxonomy on EC2
Goal: using qiime 1.7.0 on EC2 to assign taxonomy to soil OTUs.
Methods: This script was used by Ryan for the fracking project, and we used it again.
The file run_soil.sh:
We then ran
1.5 hours
Methods: This script was used by Ryan for the fracking project, and we used it again.
The file run_soil.sh:
#!/bin/bash
nohup echo "start time: $(date)"
nohup time \
parallel_assign_taxonomy_rdp.py \
-i /home/ubuntu/data/soil/pick_rep_set.fasta \
-O 8 \
--rdp_max_memory 4000 \
-o /home/ubuntu/data/soil/tax_assign_out/
nohup echo "end time: $(date)"
We then ran
./run_soil.sh &
to use this script.1.5 hours
Sep 8, 2013
denoising done! OTU picking on our Cluster
Goals: pick OTUs on the HHMI Cluster
This script was run with
OTUs with the following script:
We then ran
Then we ran
Results: OTUs were picked very quickly (15 minutes). A total of 12528 OTUs were found, 8638 of which were new.
Picking the rep set was also very fast.
Assigning taxonomy with RDP hangs on qiime 1.6.0. This is a known issue, which has been fixed in 1.7.0. We could get qiime 1.7.0 running on our cluster or use and EC2 instance.
4 hours with great happiness and sadness.
Denoising Methods:
454 sequences were denoised using the following script, which was calledrerun.sh
.rm out/ -Rf
rm nohup.out
echo "Start time: $(data)"
denoise_wrapper.py -v -i GSTY.sff.txt \
-f GSTY_s20_seqs.fna \
-m GSTY_mapping.txt \
-o out/ -n 8 --titanium
echo "End time: $(data)"
This script was run with
nohup ./rerun.sh &
On our Cluster
We remove completed files from EC2 instances, usedcat
to combine the sequences (.fna files) into combined_seqs.fna, and uploaded this to out cluster. The file 97_outs.fast
from GreanGenes gg_13_5
OTUs with the following script:
pick_otus.py -i combined_denoised_seqs.fna -z -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -m uclust_ref --uclust_otu_id_prefix qiime_otu -o uclust_ref_gg12_
We then ran
pick_rep_set.py -i uclust_ref_gg12_/combined_denoised_seqs_otus.txt -f combined_denoised_seqs.fna -r /share/apps/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -o pick_rep_set
Then we ran
parallel_assign_taxonomy_rdp.py -i pick_rep_set.fasta -o rdp_assigned_taxonomy/ -O 32
Results: OTUs were picked very quickly (15 minutes). A total of 12528 OTUs were found, 8638 of which were new.
Picking the rep set was also very fast.
Assigning taxonomy with RDP hangs on qiime 1.6.0. This is a known issue, which has been fixed in 1.7.0. We could get qiime 1.7.0 running on our cluster or use and EC2 instance.
4 hours with great happiness and sadness.
Subscribe to:
Posts (Atom)