Feb 17, 2013

Normalizing reads

Can Qiime analyze paired end reads?
Some parts of the workflow can handle it, but we will analysis from a single end because it that workflow has been established and used before.

When are the sequesces of the 515F or 806R primers?
In our sequences, will we see these primers or their reverse compliments?

In 16S, is the regions of high variability closer to the 515F or 806R primer?

Method to normalize reads:
Our data has already been quality filtered. Because we do not have to worry about quality, we will use the following process because it uses existing qiime scripts (and knowledge Erin's expert knowledge).
Use convert_fastaqual_fastq.py to get .fna and .qual files from the .fastq we have.

convert_fastaqual_fastq.py -f file_you_want_converted.fastq -o output_directory -c fastq_to_fastaqual

To run the files in bulk I added all the files names to a file called test.txt and can the following bash script overnight:

while read line
do time convert_fastaqual_fastq.py -f $line -o R1_fasta -c fastq_to_fastaqual
done < list.txt


I tried running this from an internal hard drive, but it was not faster at all.

I can truncate_fasta_qual_files.py on the resulting .fna and .qual files, using the following script.

while read line; do echo -e "\n\n running $line"; time truncate_fasta_qual_files.py -f $line.fna -q $line.qual -b 150 -o fasta_filtered/; done < files.txt

This runs the truncation script on every file listed in files.txt, which looks like this:

ALXM_S15_L001_R1_001
BCWL_S16_L001_R1_001
CaRM_S17_L001_R1_001
CroRM_S18_L001_R1_001
...


All files are now the right length for the next stage in the pipeline!

No comments:

Post a Comment