Currently, I have developed a perl program that prompts the user for a GenBank, EMBL, FASTA or IntelliGenetics file name and proceeds to predict the global folding energy of the resulting RNA molecule, by invoking Mfold, from the sequence. To continue my research, I plan to determine the global folding energy of a sequence and then take a 100-nucleotide long section of that sequence and use a different program called Rfold to calculate the local base pairing probabilities. Next, I will make synonymous substitutions based on a designated threshold that will substitute high probability positions with low probability positions to maximize the folding energy. Then, I will put this "improved" section back into the sequence and run mfold again to determine it's affect on the global folding energy.
This week I have begun the development of a Perl script designed to automatically calculate the local and global folding energies of RNA. I have installed and configured both CPAN and BioPerl. However, I have encountered several problems trying to incorporate the Bio::Tools::Run::PiseApplication::mfold module because it is an obsolete part of the BioPerl package so it is no longer included or supported by BioPerl. After several failed installation attempts and unsuccessful Perl scripts, I have decided to explore alternative options and am currently researching other BioPerl modules in order to create a successful script that can simulate RNA folding.
This week I have been working on installing and understanding a collection of Perl modules called "BioPerl" to help me design a program that can calculate the local and global folding energies of RNA. One class that has been of particular interest to me is the "Bio::Tools::Run::PiseApplication::mfold" class which will hopefully facilitate running Mfold through the Perl script. Another module I have been looking at is "Bio::Align::DNAStatistics" which can calculate several useful statistics about an input sequence such as the number of synonymous and non-synonymous mutations. This module has the potential to be very useful in developing a function that can generate synonymous mutations.
I plan to continue researching BioPerl in search of additional useful modules as well as begin the actual construction of the program to determine exactly what processes need to be done. This past week, I have been examining how well we can predict the total folding energy of RNA. I have been using a program called Mfold which predicts the secondary structure of RNA using thermodynamic methods. I began by folding an entire RNA molecule (Rattus rattus CCR5 gene, CCR5-G allele cds) and recording the total folding energy. Then, I broke the sequence into smaller sequences, 30bps longs (30-mers), and determined their individual folding energies. I recorded my results in a Microsoft Excel spreadsheet to help identify how well the global energy correlates with the local energy.
Being these calculations are tedious and cumbersome, I plan to develop a Perl program that can automate this procedure and output the local energies of the k-mers of a given RNA sequence. Moreover, I plan to modify the RNA sequences by making synonymous changes and examining how these changes affect both the local and global folding energies. I have also continued practicing programming in Perl by creating a program that can take either a DNA or RNA sequence and convert it to its respective amino acid translation. However, I have found that there exists Perl tools called "BioPerl" that are designed to aid in the development of biological programs like these. In the following weeks I plan to better familiarize myself with BioPerl in the hopes that it can be of some use in our research. |
JAY VillariUndergraduate student studying Computer Science and Biology at The College of New Jersey Archives
May 2016
Categories |