MANUAL ON THE PROGRAM 'VOLUME' The program is written in VAX-11 FORTRAN (based on FORTRAN 77) Copywrite (c) Apr 1983 Yale University (Please check the header of the VOLUME source code for the full copywrite statement.) Author: F.M.Richards Version 1: DEC 1974 Revision: 14 Apr 83 References: 1) RICHARDS, F.M. J. Mol. Biol. 82: 1 - 14 (1974) The Interpretation of Protein Stuctures: Total Volume, Group Volume Distributions and Packing Density 2) RICHARDS, F.M. Methods in Enzymology 115: 440-464 (1985) Calculation of Molecular Volumes and Areas for Structures of known Geometry. In its present version the VOLUME program determines the volume of a polyhedron surrounding each selected atom or atoms in a protein when the polyhedral faces are determined by one of three procedures based on the Voronoi construction. The reader should refer to the references listed below for a general description of these procedures, the differences between them and which one is likely to be of use in a particular situation. ********************************************************************* OVERCOMING COMPILING PROBLEM WITH VOLUME PROGRAM WHEN COMPILING THE VOLUME PROGRAM YOU MAY GET AN ERROR MESSAGE RELATING TO THE NUMBER OF CONTINUATION LINES ALLOWED. TO OVERCOME THIS PROBLEM YOU MAY HAVE TO SPECIFY A SPECIAL COMPILING OPTIION. ON THE "VAX VMS" YOU WOULD COMPILE BY TYPING: $FORTRAN VOLUME.FOR/CONTINUATIONS=50 50 IS THE NUMBER OF CONTINUATION LINES ALLOWED BY THE COMPILER. THE DEFAULT VALUE FOR OUR SYSTEM WAS 10. *********************************************************************** 1) INPUT: The program requires a formatted file as input. As written, it will accept the output file from the area program ACCESS directly. This will be the normal input because of the KEY selection procedure described below. The file is processed as follows: All header material is disregarded. The program searches each line for the starting characters 'BEGIN '. If found, the program then reads into the appropriate arrays the atom data starting on the following lines in the format: (1X,I2,7X,I2,1X,I2,1X,2A4,1X,I3,3F8.3,2F5.2,2F8.3,1X,F5.3) The arrays filled are: KEY(I) = atom processing flag. see below NUMCHN(I) = chain number if more than one peptide in input file (not used in this program) NUMFIL(I) = file number if more than one protein is listed in input file (not used in this program) ATM(I) = atom designation. 1 to 4 characters (use BNL system) RES3(I) = residue designation (three letter code) SEQNUM(I) = sequence number of residue X(I) = X coordinate of atom Y(I) = Y coordinate of atom Z(I) = Z coordinate of atom RVDW(I) = vanderWaals radius of atom RCOV(I) = covalent radius of atom AAREA = accessible area of atom in file from ACCESS, or dummy values. CAREA = contact area of atom in file from ACCESS, or dummy values. FRCACC = fractional accessibility of atom in file from ACCESS, or dummy values. (These three dummy variables are not used in this program. Kept in list as reminder) NOTE: The maximum value for I must be larger than the number of atoms in the input list. (These may be protein atoms, 'heteroatoms',and/or water molecules depending on the file used.) You must ensure that the array dimensions in the VOLUME program are large enough not only to accomodate the input list but also to hold the output of SHELL (see below). The relevant arrays are dimensioned through the parameter MAXATM which appears in the file VOL.CMN. Check this value and edit as required. 2) SUMMARY OF OPERATION: ( NOTE: RESPOND TO ALL PROMPTS WITH CAPITAL LETTERS ) A) The program reads in the input file (whose name must be provided at the prompt). B) The next step is to surround the protein with a hypothetical 'solvent' shell which serves to define the volume of the surface atoms on the protein. The positions assigned to these 'molecules' are loaded onto the end of the protein atom file. They are used to define the limiting polyhedron, but, except in the case of cavity calculations, they are not used as centers for volume calculations themselves. This layer is set up by the subroutine SHELL. C) The program then cycles through each atom in turn whose volume is to be calculated (as designated by the KEY array). a) The Subroutine FILBOX selects all atoms within a defined radius which might contribute to the limiting polyhedron surrounding the target atom. Potential defining planes of this polyhedron are perpendicular to the interatomic vectors between the target atom and the atoms in this list. The position of these planes along the vectors will depend on the procedure for calculation chosen at the time of program initiation. FILBOX sets up these planes and loads then into the array BOX in the order of increasing distance from the target atom. b) The Subroutine ATMPOL examines the list BOX and selects those planes which actually make up the limiting polyhedron. It computes the coordinates of all the vertices of the polyhedron and stores these coordinates, and the vertex identification numbers, in the arrays POLYHN AND IPOLYH. This is the heart of the program and its most complex component. c) The Subroutine POLVOL examines the arrays POLYHN and IPOLYH and computes the volume of this limiting polyhedron. d) The program then cycles through a), b) and c) until the atom list has been completed. 3) OUTPUT The OUTPUT of the program is provided in two files. The successful calculations are put into a file designated as local unit number 2 with a name specified at the time of program initiation. The format of this file is: BEGIN (followed by data records in the following format) (2I5,2I3,2X,A4,2X,A4,I5,3F8.3,2F5.2,F8.3,I5) NCALC = Number of atom in this output list I = Serial number of atom in full coordinate list NUMCHN(I) = Chain number (see INPUT) NUMFIL(I) = File number (see INPUT) ATM(I) = Atom designation - 1 to 4 characters RES3(I) = Residue designation - 3 letter code SEQNUM(I) = Residue sequence number X(I) = X coordinate of atom Y(I) = Y coordinate of atom Z(I) = Z coordinate of atom RVDW(I) = vanderWaal's radius of atom RCOV(I) = Covalent radius of atom VOLUME = Volume of atom as calculated by specified method NS = Number of solvent shell positions used in defining the limiting polyhedron The second file, local unit number 3, also named at initiation, contains any error messages that may have been generated during the operation of the program. If serious errors are encountered the calculation for that atom is aborted (VOLUME and NS are both set equal to 1000),error messages listed, and the calculations continued with the next atom in the list. Occasionally less serious errors are found, the messages are listed, but the calculation is continued. The decision on whether or not to keep the result is made by the investigator. Normally, the cause of the problem can be identified, corrected and the calculation rerun. 4) TERMINAL SESSION Enter: RUN VOLUME Prompt: "WHAT IS THE NAME OF THE INPUT FILE ?" Enter: X(X)n.XXX (i.e. CYTC.INP) Prompt: "WHAT IS THE NAME OF THE OUTPUT FILE FOR DATA? Enter: X(X)n.XXX (i.e. CYTC.VOL) Prompt: "WHAT IS THE NAME OF THE OUTPUT FILE FOR ERRORS? Enter: X(X)n.XXX (i.e. CYTC.ERR) Prompt: "CURRENT VALUE FOR THE LENGTH OF THE "SHELL" LATTICE, CE, IS : DO YOU WANT TO CHANGE THIS VALUE? ENTER Y/N Enter: N or Y IF Y Prompt: "ENTER NEW VALUE (X.XX)" Enter: 3.05 (for example) (note default is 2.80) Prompt: "CURRENT VALUE OF SEARCH RADIUS, ARANGE, IS:" DO YOU WANT TO CHANGE THIS VALUE? ENTER Y/N IF Y Prompt: "ENTER NEW VALUE (X.XX)" Enter: 6.10 (for example) ( note default is 6.50) Prompt: "DECIDE METHOD TO BE USED FOR VOLUME CALCULATION. ENTER: 1,2,OR 3. 1) STANDARD VORONOI PROCEDURE 2) RICHARDS' METHOD B 3) RADICAL PLANE PROCEDURE Enter: 1 (for example) Program now runs to completion through the entire list of flagged atoms. Output is loaded into the two output files designated at the start of the run. FORTRAN STOP 5) PROGRAM DETAILS A) INPUT ARRAY KEY(I): The KEY array is normally set up in the area program ACCESS, but it can be hand edited or supplied otherwise. KEY(I) = -1 Atom I is omitted entirely from all calculations. KEY(I) = 0 Atom I is included as part of the protein and its volume is calculated. KEY(I) = 1 Atom I is included as part of the protein and is thus used in defining the polyhedra, but its volume is not calculated. (The (-1) value can be used to asses the affect of a particular atom or atom group on the volume available to surrounding atoms without any other change in the program, for example.) Only the atoms of interest need have their volumes calculated. If the program ACCESS is used to generate the input file, these atoms can be selected on the basis of atom name, residue name, residue sequence number, or individually by serial number in th coordinate list. B) PROGRAM SECTIONS The principal COMMON BLOCKS and DIMENSION parameters are contained in the files: 'VOL.CMN' and 'TEST.CMN'. The arrays INDX1, INDX2, IBND1, RESTYP, ATNAMA are filled through a BLOCK DATA statement. (Normal Fortran subroutines or functions are not included in the following list.) FUNCTION OR FUNCTIONS OR SUBROUTINE Remarks SUBROUTINES CALLED Principal Components: VOLUME Main calling program. Sets up ASKFIL,GETNUM, input and output files, queries SPRAY,SHELL,FILBOX, for changes in default parameters ATMPOL,POLVOL and method of calculation. Cycles (only if compiled through atom list and writes with D_LINES: output file. PRNTBX,PRNTPN) SHELL Sets up hypothetical solvent None boundary to 'define' protein surface. FILBOX Selects atoms from coordinate SPRAY,BXLOAD, list that may define the GETNUM limiting polyhedron. Constructs all potential face planes. ATMPOL Establishes the limiting poly- ERROR,VERTEX, hedron from the list supplied SAMSID by FILBOX. Provides list of coordinates of all vertices and their identities. POLVOL Calculates volume of the limiting VOLTET polyhedron. ERROR Puts out error messages in None response to certain types of geometrical errors or array dimension problems found in program run. (i.e. three points defining a plane are colinear; three planes do not intersect at a point etc.) Utility Routines: ASKFIL Prompts for file name and opens None input or output file. GETNUM Identifies residue type number None and atom type number for an atom specified by serial number in the coordinate list. SPRAY Initiates array with specified None integer or integer series. BXLOAD From the normal components and a None point P associated with atom J, the constant factor in the equation for the plane is calc and loaded in BOX(5,J) (The following two subroutines are used only when VOLUME has been compiled with D_LINES for debugging purposes) PRNTBX Prints out contents of arrays None IBOX and BOX after return from FILBOX PRNTPN Prints out contents of arrays None POLYHN and IPOLYH after return from ATMPOL. Mathematical Subroutines: VERTEX Calculates the point of inter- POINT section of three planes defined by four specified atoms. INSIDE Logical function set to .TRUE. SAMSID if a point P is inside a tetrahedron formed by four specified planes. SAMSID Logical funtion set to .TRUE. None if two points P1 and P2 are on the same side of a specified plane. VOLTET Returns the volume of a tetra- DET3C hedron specified by its four vertices. PLANE Returns the equation of the plane DET3R,ERROR determined by three given points. (error if three points colinear) POINT Returns the point of intersection DET3R,ERROR of three given planes.(error if point is indeterminate) CENTRD Computes the centroid of a tetra- None hedron identified by its 4 vertices (not used in present version of program) DET3C For the 3x4 array A, the values None of the four 3x3 unique determinants are returned. DET3R For the 4x3 array A, the values None of the four 3x3 unique determinants are returned. C) SUBROUTINE SHELL Positions in a cubic lattice, ICL, just outside of, but adjacent to, the protein atoms are identified. These will serve to define the protein surface for the purpose of the volume calculations as described later. In principle, there is no particular restriction on the size of the lattice, although the calculated volumes of the surface atom will depend to some extent on the lattice dimensions. In the current version, the default value of the lattice edge is 2.8 A giving a cube which is about 3/4 of the volume occupied by a water molecule. The uncertainty in the eventual Voronoi volume increases with lattice size while the computation time increases rapidly as the lattice size is decreased. The size of the lattice, ICL, in the three dimensions is made larger than the protein so that there are at least two empty cubes in each direction outside of the protein. The protein atoms are then loaded into the lattice positions. With the default size about half of the lattice positions that contain any protein atom contain only a single atom. A few cubes contain as many as five atoms. When the loading is complete each entry in the array ICL contains either 0 or a positive integer. The latter is an index to a location in the array JSERN. That entry in JSERN gives the number of following positions in JSERN that contain the serial numbers of the atoms whose centers are located in the lattice cube identified by the ICL array index. The entire array ICL is now checked. All positions whose 26 nearest neighbors are not protein-containing locations are loaded with the flag -1. These positions represent bulk solvent remote from the protein. Of the remaining positions containing 0s some may be inside the protein and represent potential cavities. These are identified as positions which have among their 26 neighbors one or more pairs of opposing positions both of which contain a protein atom(s). Such positions are loaded with the flag -2. At this point both the 0 ad -2 lattice positions, while not containing any protein atoms, may have the cube center inside the van der Waals envelope of a protein atom. This is checked and any solvent (i.e. 0) positions so found are flagged -3, and any such -2 positions are shifted to -4. The -3 and -4 positions are not considered further in this program but are flagged should the need arise to refer to them. The remaining 0 and -2 positions are now both empty and outside the van der Waals envelope of the protein. In most calculations the -2 positions will be disregarded and the space that they represent is assigned to the protein. The positions labeled 0 are all face, edge or corner connected to themselves and to cubes containing protein atoms. They thus form a connected envelope around the protein. If SHELL is compiled with D_LINES the final contents of ICL will be listed in the output file on local unit 2. The three dimensional array is sectioned perpendicular to Z (K index). The X axis (I index) is horizontal and Y (J index) vertical. Each lattice position is labeled with -4, -3, -2, -1, 0 or **, the last for protein containing positions. D) SUBROUTINE FILBOX This subroutine is entered with the identity and coordinate of the target atom whose volume is to be calculated. Its location in the ICL lattice is established as are the 26 surroundng cubes which would contain the relevant 'solvent' positions, if any. The coordinate file is then checked and all atoms whose centers are within the selected distance, ARANGE, of the target atom are loaded into the arrays TEMP and ITEMP. Any of these atoms which are 'solvent' positions have their distance adjusted such that the Voronoi plane to be drawn will be tangent to the van der Waals sphere. This group of atoms is now loaded into the arrays BOX and IBOX in order of their increasing distance from the target atom. At this point the serial number in the coordinate lists of the atom and a code to indicate whether it is a 'solvent' shell position or not is contained in IBOX. In BOX the distance to the target atom is located in BOX(1,I) and the X, Y and Z components of this distance in BOX(2-4, I). The latter represent the interatomic vector and are the direction components of the normal to the Voronoi plane. This plane is completely defined by specifying the point where it intersects the interatomic vector. One of three procedures is chosen to define this point of intersection. In the original Voronoi procedure the plane bisects the interatomic vector. In Richards' method B the ratio of van der Waals or covalent radii of the atoms are used. In the radical plane procedure only the van der Waals radii of the atoms are used whether they are covalently connected or not. The details of these procedures are discussed in reference 2. The final section of FILBOX loads into the highest index positions of BOX, four hypothetical planes surrounding the target atom. These represent a very large tetrahedron which is simply used as a starting point for the determination of the limiting polyhedron by the next subroutine. E) SUBROUTINE ATMPOL This subroutine operates on the contents of the array BOX and loads the arrays POLYHN and IPOLYH with the vertex positions and identifications of the developing polyhedron. Each vertex represents the point of intersection of three planes. The X, Y, Z coordinates of this point are placed in POLYHN. The vertex is identified by the three planes, each of which, in turn, is associated with a particular atom surrounding the target atom. For the purpose of this subroutine, these atoms are identified by their index in the array BOX. The vertex can thus be identified by three such indices. These indices are stored in IPOLYH. The initial set of 4 vertices are produced from the large tetrahedron loaded in positions 97-100 in BOX. ATMPOL then starts checking each plane in BOX to see if it forms part of the limiting polyhedron. This is accomplished by examining each vertex with respect to the plane. If the vertex is on the same side of the plane as the target atom, then it remains part of the polyhedron. If the plane separates no vertices from the target atom, then it is not part of the polyhedron and is disregarded. If the plane does separate one or more vertices, then this vertex or vertices are discarded and the new vertices produced by the plane are added to the list. This last operation is the most complex. For this purpose, the vertices are identified as the intersection of three lines (the array LINES), rather than three planes. Each line represents the intersection of two of the planes that form the vertex. Each line is represented by an index pair. When a vertex is eliminated, the three lines that identify that vertex form 3 new vertices through their intersection with the new plane. When more than one vertex is removed by a single plane, it is necessary to keep track of which lines are kept and thus form new vertices with the plane. Some lines will be eliminated completely, that is they do not intersect the plane within the region of interest. As each plane is tested, the polyhedron is unaffected or made smaller. The total number of vertices generally increases but varies erratically during the computation. Normally by the end of the calculation less than one half of the entries in BOX actually are used to define the limiting polyhedron. This indicates that the limit has indeed been reached, and that no other atoms in the structure, which have not been explicity examined, would contribute to the polyhedron if tested. F) SUBROUTINE POLVOL This subroutine takes the contents of POLYHN and IPOLYH and calculates the volume of the limiting polyhedron. The vertices of each face have one index in common, the index that identifies the atom associated with the face normal. All index pairs in IPOLYH that contain that common index are collected. The other pair of indices associated with each vertex are then arranged in sequence to define the perimeter of the face. The face is then divided into triangles with a common apex at one vertex. The volume of the tetrahedra formed by these triangles and the target atom at the center of the polyhedron are then calculated and summed. This procedure is repeated for each face of the polyhedron yielding,finally, the total volume. It might be noted here that if one wishes to draw one or more of the polyhedra with a graphics program, the necessary data can be obtained from this subroutine. The coordinates of all the vertices are contained in POLYHN and the subroutine orders the vertices in the sequence in which the vectors would be drawn. ADDENDUM The volume of the -2 positions provided by SHELL may be calculated separately, treating these positions as additional "protein atoms". Note that such calculations, of course, will change the volume calculated for adjacent protein atoms. No van der Waals radius is assigned to these positions. The defining planes are drawn tangent to the surrounding protein atoms. The "cavity" volume so calculated is only a crude approximation to the real void space because of the coarseness of the lattice used. (See: F.M.Richards, Carlsberg Research Communications 44: 47-63 (1979)). This option has NOT been activated in this version of the VOLUME program. END ======================================================================== SAMPLE OUTPUT FROM PROGRAM 'VOLUME' ---------------------------------------------------------------------- INPUT FILE NAME IS :1PCY.INP DATA OUTPUT FILE NAME IS :1PCY.VOL ERROR OUTPUT FILE NAME IS :1PCY.ERR THE VALUE OF CE IS: 2.80 THE VALUE OF ARANGE IS: 6.50 THE METHOD OF CALCULATION IS: 1 THE VALUE OF RDELTA IS: 0.00 TOTAL NUMBER OF PROTEIN ATOMS IN INPUT LIST IS: 100 NUMBER OF SOLVENT MOLECULES REMAINING = 255 NUMBER OF SOLVENT MOLECULES REMOVED = 5 NUMBER OF INTERIOR CAVITIES REMAINING = 17 NUMBER OF INTERIOR CAVITIES REMOVED = 6 NUMBER OF CELLS WITH PROTEIN ATOMS = 58 NUMBER OF CELLS WITH 1,2,3,ETC PROTEIN ATOMS = 30 16 10 2 0 0 0 0 0 0 # SER# CH FI ATM RES RES# X Y Z RVDW RCOV VOLUME NSOL BEGIN 1 1 0 1 N ILE 1 -1.408 19.495 26.885 1.70 0.70 21.589 15 2 2 0 1 CA ILE 1 -1.298 20.820 27.539 2.00 0.77 11.922 10 3 3 0 1 C ILE 1 -1.427 21.928 26.511 1.70 0.77 8.185 7 4 4 0 1 O ILE 1 -0.968 21.774 25.358 1.40 0.66 13.016 15 5 5 0 1 CB ILE 1 0.061 20.882 28.334 2.00 0.77 12.444 7 6 6 0 1 CG1 ILE 1 0.183 22.228 29.052 2.00 0.77 20.846 7 7 7 0 1 CG2 ILE 1 1.297 20.593 27.410 2.00 0.77 27.535 17 8 8 0 1 CD1 ILE 1 1.425 22.330 30.014 2.00 0.77 31.897 14 9 9 0 1 N ASP 2 -2.018 23.057 26.937 1.70 0.70 13.894 6 10 10 0 1 CA ASP 2 -2.055 24.263 26.111 2.00 0.77 10.429 6 11 11 0 1 C ASP 2 -1.011 25.312 26.565 1.70 0.77 8.469 6 12 12 0 1 O ASP 2 -0.789 25.540 27.751 1.40 0.66 14.058 14 --------------------- TOTAL VOLUME OF CALCULATED ATOMS = 1642.12 ============================================================================== SAMPLE OUTPUT FROM PROGRAM 'VOLFMT' ------------------------------------------------------------------------------ ************************************************** RUN : 16-JUN-88 16:22:08 ************************************************** PROGRAM VOLFMT FORMATTED OUTPUT BY RESIDUE TYPE OF DATA FILE FROM PROGRAM VOLUME ************************************************** INPUT FILE NAME = 1PCY.VOL OUTPUT FILE NAME = 1PCY.FMT ************************************************** GLY N CA C O MAIN SIDE TOTAL 6 15.3 21.8 9.8 38.4 85.3 0.0 85.3 10 12.4 24.5 9.2 11.6 57.8 0.0 57.8 AVE 13.9 23.2 9.5 25.0 71.5 0.0 71.5 ------------------ LEU N CA C O CB CG CD1 CD2 MAIN SIDE TOTAL 4 14.9 10.8 8.8 19.1 21.3 13.9 42.8 32.2 107.8 110.1 217.9 5 15.1 12.8 8.2 14.2 19.1 12.6 33.7 27.3 50.2 92.7 142.9 12 14.3 11.0 8.0 12.2 18.6 11.9 29.5 36.8 45.5 96.8 142.3 AVE 14.8 11.5 8.4 15.2 19.7 12.8 35.3 32.1 67.9 99.9 167.7 ------------------ SER N CA C O CB OG MAIN SIDE TOTAL 11 13.1 11.0 7.8 20.7 20.8 15.6 52.5 36.4 88.9 AVE 13.1 11.0 7.8 20.7 20.8 15.6 52.5 36.4 88.9 ------------------ ASP N CA C O CB CG OD1 OD2 MAIN SIDE TOTAL 2 13.9 10.4 8.5 14.1 23.9 7.5 14.1 13.8 46.8 59.3 106.1 8 13.9 11.0 8.4 11.0 18.4 7.7 16.0 13.8 44.3 55.9 100.2 9 11.5 12.2 7.4 11.3 18.7 7.5 14.9 13.2 42.4 54.3 96.8 AVE 13.1 11.2 8.1 12.1 20.4 7.6 15.0 13.6 44.5 56.5 101.0 ------------------ ************************************************** GRAND TOTALS: MAIN CHAIN = 793.3 SIDE CHAIN = 848.9 ALL = 1642.1 **************************************************