As the end of the semester approaches, I will summarize the program I developed that has generated all of these results.
First, it instantiates 19 variables and opens 3 output files: one for the number of nucleotides above a given threshold (per iteration), one for the local energy (per iteration), and one for the global energy (per 10 iterations). Next it calls importGlobalSequence() which prompts the user for a (FASTA) filename and imports the sequence as a string. Then it passes the global sequence to another subroutine (calculateFoldingEnergy(STRING)) that calculates the folding energy using UNAFold. Next it enters a loop that imports the local sequence (importLocalSequence()) which prompts the user for a window size and creates an array of possible pieces in the sequence and uses "rand" to select a random window. It passes this local sequence to generateFASTAFile() to generate a FASTA file for this sequence and calculates the folding energy of the local sequence using calcFoldingEnergy() again. Then it passes the local FASTA sequence and the name of the desired fold output file and calls Rfold to calculates the local base pairing probabilities. Next is calls createArrOfPosAboveThreshold() which creates an array of positions above (for minimization) or below (for maximization) a user-specified threshold.
It then enters another loop (that iterates 10 times) that modify a nucleotide by calling modifyOneNucleotide() which makes a single synonymous changes in positions above/below a threshold by calling synonymousChange(). Then it generates a FASTA file in order to calculate the folding energy and generates an Rfold file to determine the nucleotides above the threshold. the loop repeats, making synonymous changes in positions above/below a given threshold. After this loop, it writes the local energies and number of nucleotides above/below the threshold to 2 output files.
Finally, it puts this modified piece back into the global sequence and keeps the changes if it works towards the desired goal (maximizing/minimizing the sequence) and writes the global energy to a third output file. This repeats for however many iterations are desired and outputs the final optimized sequence at the end.
First, it instantiates 19 variables and opens 3 output files: one for the number of nucleotides above a given threshold (per iteration), one for the local energy (per iteration), and one for the global energy (per 10 iterations). Next it calls importGlobalSequence() which prompts the user for a (FASTA) filename and imports the sequence as a string. Then it passes the global sequence to another subroutine (calculateFoldingEnergy(STRING)) that calculates the folding energy using UNAFold. Next it enters a loop that imports the local sequence (importLocalSequence()) which prompts the user for a window size and creates an array of possible pieces in the sequence and uses "rand" to select a random window. It passes this local sequence to generateFASTAFile() to generate a FASTA file for this sequence and calculates the folding energy of the local sequence using calcFoldingEnergy() again. Then it passes the local FASTA sequence and the name of the desired fold output file and calls Rfold to calculates the local base pairing probabilities. Next is calls createArrOfPosAboveThreshold() which creates an array of positions above (for minimization) or below (for maximization) a user-specified threshold.
It then enters another loop (that iterates 10 times) that modify a nucleotide by calling modifyOneNucleotide() which makes a single synonymous changes in positions above/below a threshold by calling synonymousChange(). Then it generates a FASTA file in order to calculate the folding energy and generates an Rfold file to determine the nucleotides above the threshold. the loop repeats, making synonymous changes in positions above/below a given threshold. After this loop, it writes the local energies and number of nucleotides above/below the threshold to 2 output files.
Finally, it puts this modified piece back into the global sequence and keeps the changes if it works towards the desired goal (maximizing/minimizing the sequence) and writes the global energy to a third output file. This repeats for however many iterations are desired and outputs the final optimized sequence at the end.