Oct 21, 2013

Identify duplicates in lists using Excel

Goal: given two lists of sequences, find the things that appear in both lists.

Methods: primer prospector filtered a list of potential primers down to the ones least likely to form secondary structure. Based on this list of 217 'good' primers, we needed to make a list of which wells those primers are in. So how do we compare these two lists?

In Excel 2013, the list of 'good' primers was added of the column containing all primers.
This column was selected then
home > conditional formatting > highlight cell rules > duplicate values
all columns with data were selected then
data > filter
In the column with barcodes, the drop-down box in the top cell was clicked then
filter > by color > cell color > (choose color from)

Results: This produces a list of barcodes which are identified as 'good' by primer prospector with their identifying plate number and position for easy access.

See also: Find duplicate values fast by applying conditional formatting

1 hour

Oct 14, 2013

Denoising on the Cluster

Goals: Denoise 454 data on the HHMI cluster.

Methods: The following script was submitted:
denoise_wrapper.py -v -i GPWS.sff.txt,GSTY.sff.txt,GSAN.sff.txt,GSAI.sff.txt \
-f combined_seqs_100213.fna -m soil_master_metadata_072513.txt \
-o combined_denoising_output/ -n 96 --titanium


Results: the database was built successfully and the filtering step runs FlowgramAli_4fr 96 times. However, all 96 of these threads run on one node, insteas of three.


New Goals: fully use three nodes while denoising

Methods: cancel denoising, AFTER I confirm with the Qiime developers that I am correctly using the script to resume denoising.

To resume denoising, I should be able to run:
mpirun denoiser.py -v -i GPWS.sff.txt,GSTY.sff.txt,GSAN.sff.txt,GSAI.sff.txt \
-f combined_seqs_100213.fna -m soil_master_metadata_072513.txt \
-o combined_denoising_output_resumed/ -p combined_denoising_output/ --checkpoint_fp combined_denoising_output/checkpoints/checkpoint50.pickle \
-c -n 96 --titanium


Kyle suggested mpirun, to balance these threads between all nodes.

10 hours over 3 days

Oct 11, 2013

Additional sample collection and DNA extraction

Goals: take additional samples from Otto's and from commercial vendors.
Methods: our group had lunch at Otto's while Chris took five additional samples from completed beer. After, we purchased two of these beers in bottles from a local store. Back in lab, up to 120 ml of all samples were filtered through Sterivex millipore filters. This makes for 7 samples today and 11 samples overall.

Goals: extract DNA
Methods: the Mo Bio kit was used to extract DNA from the filters after filters were sliced into thin strips. A Qubit® 2.0 Fluorometer was used to measure DNA concentrations in extracted samples.
Results: Of the 11 samples, only 3 produced delectable quantities of DNA (>5 ng/ml).

Next: get all possible DNA from existing samples

15 hours over two days