Difference between revisions of "Tutorial:Using regular expression: Selecting sequence motifs of a Chain"
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This is an example on how to select sequence motifs from '''[[MSL Objects:Chain|Chain]]''' objects. A '''[[MSL Objects:Chain|Chain object]]''' versus a '''[[MSL Objects:System|System object]]''' is used because regular expressions can not span across chains. | This is an example on how to select sequence motifs from '''[[MSL Objects:Chain|Chain]]''' objects. A '''[[MSL Objects:Chain|Chain object]]''' versus a '''[[MSL Objects:System|System object]]''' is used because regular expressions can not span across chains. | ||
− | + | [http://mslib.svn.sourceforge.net/viewvc/mslib/trunk/examples/example_regular_expressions.cpp?view=markup Complete source of example_regular_expressions.cpp] | |
+ | |||
+ | |||
+ | In MSL the program [http://mslib.svn.sourceforge.net/viewvc/mslib/trunk/programs/grepSequence.cpp?view=markup grepSequence] utilizes the type of code in this tutorial to search for sequences in a list of PDB files, and then structurally align them. | ||
− | |||
=== To compile === | === To compile === | ||
Line 13: | Line 15: | ||
Go to the main directory and run the command ('''note''', the location of the exampleFiles subdirectory needs to be provided as an argument) | Go to the main directory and run the command ('''note''', the location of the exampleFiles subdirectory needs to be provided as an argument) | ||
<source lang="text"> | <source lang="text"> | ||
− | % bin/ | + | % bin/example_regular_expressions exampleFiles/example0004.pdb |
</source> | </source> | ||
=== Program description === | === Program description === | ||
+ | Read in structure into System object, check that chain "A" exists. | ||
<source lang="cpp"> | <source lang="cpp"> | ||
+ | string file = "example0004.pdb"; | ||
+ | file = (string)argv[1] + "/" + file; | ||
+ | cout << "Create an AtomContainer and read the atoms from " << file << endl; | ||
+ | |||
System sys; | System sys; | ||
if (!sys.readPdb(file)) { | if (!sys.readPdb(file)) { | ||
− | + | // reading failed, error handling code here | |
+ | cerr << "ERROR could not read in "<<file<<endl; | ||
+ | exit(0); | ||
} | } | ||
− | + | ||
− | + | // Check to make sure chain A exits in sys | |
− | + | if (!sys.chainExists("A")){ | |
− | + | // error code here. | |
− | + | cerr << "ERROR chain A does not exist in file "<<file<<endl; | |
+ | exit(0); | ||
+ | } | ||
+ | |||
+ | // Get a Chain object | ||
+ | Chain &ch = sys.getChain("A"); | ||
− | + | </source> | |
− | |||
− | |||
− | |||
− | + | Setup a regular expression object (RegEx) and a regular expression string to match 2 Valines followed by Isoleucine and then a Leucine. The RegEx match gets residue (or position) indices into the parent System object. | |
− | + | <source lang="cpp"> | |
− | + | ||
− | + | // Regular Expression Object | |
− | + | RegEx re; | |
− | + | ||
− | + | // Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other | |
− | + | string regex = "V{2}IL"; | |
− | + | ||
− | + | // Now do a sequence search... | |
− | + | vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex); | |
− | + | ||
− | + | ||
− | + | // Loop over each match. | |
− | + | for (uint m = 0; m < matchingResidueIndices.size();m++){ | |
− | + | ||
− | + | // Loop over each residue for this match | |
− | + | int match = 1; | |
+ | for (uint r = matchingResidueIndices[m].first; r <= matchingResidueIndices[m].second;r++){ | ||
+ | |||
+ | // Get the residue | ||
+ | Residue &res = ch.getResidue(r); | ||
+ | |||
+ | // .. do something cool with matched residues ... | ||
+ | cout << "MATCH("<<match<<"): RESIDUE: "<<res.toString()<<endl; | ||
+ | } | ||
+ | } | ||
Latest revision as of 22:52, 1 April 2010
This is an example on how to select sequence motifs from Chain objects. A Chain object versus a System object is used because regular expressions can not span across chains.
Complete source of example_regular_expressions.cpp
In MSL the program grepSequence utilizes the type of code in this tutorial to search for sequences in a list of PDB files, and then structurally align them.
To compile
% make bin/example_regular_expresssions
To run the program
Go to the main directory and run the command (note, the location of the exampleFiles subdirectory needs to be provided as an argument)
% bin/example_regular_expressions exampleFiles/example0004.pdb
Program description
Read in structure into System object, check that chain "A" exists.
string file = "example0004.pdb";
file = (string)argv[1] + "/" + file;
cout << "Create an AtomContainer and read the atoms from " << file << endl;
System sys;
if (!sys.readPdb(file)) {
// reading failed, error handling code here
cerr << "ERROR could not read in "<<file<<endl;
exit(0);
}
// Check to make sure chain A exits in sys
if (!sys.chainExists("A")){
// error code here.
cerr << "ERROR chain A does not exist in file "<<file<<endl;
exit(0);
}
// Get a Chain object
Chain &ch = sys.getChain("A");
Setup a regular expression object (RegEx) and a regular expression string to match 2 Valines followed by Isoleucine and then a Leucine. The RegEx match gets residue (or position) indices into the parent System object.
// Regular Expression Object
RegEx re;
// Find 3 Prolines surrounded by two Glycines on one side and three Glycines on the other
string regex = "V{2}IL";
// Now do a sequence search...
vector<pair<int,int> > matchingResidueIndices = re.getResidueRanges(ch,regex);
// Loop over each match.
for (uint m = 0; m < matchingResidueIndices.size();m++){
// Loop over each residue for this match
int match = 1;
for (uint r = matchingResidueIndices[m].first; r <= matchingResidueIndices[m].second;r++){
// Get the residue
Residue &res = ch.getResidue(r);
// .. do something cool with matched residues ...
cout << "MATCH("<<match<<"): RESIDUE: "<<res.toString()<<endl;
}
}