Step 1: Generate Gromacs input files from a PDB structure

Version 1.1.1 (Updated 1/14/12: OUTDATED)

Version Notes
Thank you for using the Onuchic Group's structure-based potential software. This module will take your pdb and contact file as input and produce a structure-based Hamiltonian with your specifications. The output TOP, GRO and index files can be used directly with GROMACS, version 4. If you prefer to use NAMD, sorry. Unfortunately, NAMD does not currently support 1 necessary field in the GROMACS topology file. Depending on your parameter selection, we kindly ask that you cite the applicable primary references.

Fill in fields 1-9 and hit "Submit Query" in order to generate your input files for Gromacs

1. Upload PDB file (may be gzipped or b2zipped)

format requirements and sample pdb/gro/top

2. What contact map should be used?  Which map should I use?

3. What level of graining would you like to use?  help

4. Would you like to use the default forcefield, or customize the parameters?  what are default?

5. How much empty space (in Å) would you like between the molecular system and the limits of the box?  Why is this important?
    Y±     Z±
6. If you don't plan to use periodic boundary conditions, then you can check this box and your system will not be placed in a box (this will override the above spacing definitions)

7. By default, the webtool will assign unit mass to each atom. If you would like to use heterogeneous masses that are roughly based on atomic masses (described here) check this box.

8. By default, if an atom name is not recognized by the webtool, the tool will stop and return an error message (Please  explain). If you want the tool to ignore any unrecognized atoms (such as hydrogen) and continue to process the pdb, check this box.

9. What nickname would you like to give this system?

After selecting the desired options, click "Submit Query". A link will be provided on the next page if the submission is successful.

Helpful Information

Functional form of the Hamiltonian

description of the photo

All-atom: The bond, angle, improper and planar terms maintain backbone geometry. Flexible dihedrals are given cosine terms. Non-local native interactions are given attractive 6-12 interactions and non-native interactions are given repulsive terms. A complete description of the All-atom model can be found here for proteins and RNA/DNA. Note: If you use a shadow map for nucleic acids, contacts between adjacent sidechains will not be rescaled by 1/3, as described in the papers. Rather, when using a shadow map, all contacts are given equal weights.

C-alpha: (for proteins only) The C-alpha model only has the bonds and angles terms to maintain backbone geometry. Dihedral angles are formed between 4 adjacent CA atoms and non-local contacts are included via a 12-10 potential, unlike the All-atom model which uses a 12-6 potential. For a complete description, see Clementi et al. (2000) J. Mol. Biol., 298, 937-953.

PDB file

Basically, follow standard PDB formatting. Here is a sample pdb file, so you can see the format expected by the webtool.
Your pdb file MUST conform to the following standards.
  • No hidden characters. They can lead to unpredictable results. To avoid accidentally inserting them use a text editor such as vi, or emacs.
  • If your file does not work with the page, only include lines that start with "ATOM" (to specify each atom), "TER" (to indicate a break between 2 chains) and "END" at the end of the file. Ligand atoms are also expected to be called ATOM, though they often appear as HETATMs in pdb files (leaving HETATM lines in your file will often lead to problems with the webtool).
  • Chain identifiers are not used. If you have multiple chains, insert "TER" (left justified) between chains. The webtool will internally index the chains sequentially, starting with 1.
  • Terminal oxygens (in proteins) are called, OXT and O (not O1 and O2).
  • The file is not read past an "END" statement (ALL CAPS, left justified). If atoms appear after an END line, these atoms will not be included.
Recognized residues include:
  • Protein residues : All 20 amino acids (3 letter codes are used).
  • RNA residues: CYT or C, GUA or G, URA or U and ADE or A.
  • Modified RNA residue: MIA (2-methylthio-N6 isopentenyl adenosine).
  • DNA residues: DG, DC, DA and DT.
  • Ligands: SAM (S-Adenosylemethionine), GNP (Gpp(NH)p), ATP, ADP, AMP, FUA (Fusidic Acid), GTP and GDP. All flexible dihedrals in each ligand are given equal strength, and this weight can be set by the user.
  • Ions: BMG (Bound MaGnesium ions) and ZN. Ions are given excluded volume and they interact through harmonic potential k*(x-x0)2, where k=1.0 (units of Å-2). For calculating which interactions are included with ions, the same contact rules are used as for ligands. Note: if using a cut-off contact map, then ion-ion contacts are included between pairs that are separated by 4 Å, or less.

Contact maps

A contact map is a list of atom-atom pairs that are "in contact" in the native structure (the pdb structure). These pairs interact via 6-12 (all-atom model) or 10-12 (C-Alpha model) interactions, where the energetic minima are at the distances found in the pdb structure. There are three supported ways of defining a contact map.
  • Shadow map (Recommended for Proteins) A shadow map includes contacts that are within a cutoff distance, are separated in sequence, and do not have an atom in between them. A full description can be found here. If you select this option, a shadow map will be generated and used in the Hamiltonian, with default values. Note: If you select a shadow map for nucleic acids, contacts between adjacent sidechains will NOT be rescaled by 1/3, as is done for the cut-off map.
  • Cut-off map (Recommended for RNA/DNA or mixed nucleic-amino acid systems) This will generate a list of contacts as determined by your specified distances and sequence differences. Recommended values are the defaults. This option is NOT enabled for CA model.
  • Upload file Upload your own contact map. The contact file requires the following format: Each line identifies a single contact. Each line has 4 fields (chain i, atom number i , chain j, atom number j). For example, to include a contact between the atom 10 (pdb numbering) of the first chain (internally indexed as 1) and atom 20 of the third chain, the line would read:
    1 10 3 20
    Blank lines, even at the end of the file can cause trouble.
If you are using the CA model, then use residue numbers, not atom numbers.


There are currently 2 levels of graining available, All-Atom and C-alpha:
  • All-Atom All non-hydrogen atoms will be included. If you have hydrogens in your pdb file, they SHOULD be ignored by this program. But, if you have given hydrogens non-standard names, then the program may complain. A complete description of the All-atom model can be found here for proteins and RNA/DNA. NOTE: the exact choice of parameters is up to you. The default values on this page are suggested values and are not necessarily identical to these references. Always double check your choice of parameters.
  • C-alpha The C-alpha model is described in Clementi C, Nymeyer H & Onuchic JN (2000) J. Mol. Biol., 298, 937-953.

What are the default values?

The default values are the values used in the initial protein and RNA papers, with the following modifications:
  • The harmonic dihedral angle constant that maintains planarity of rings has been increased from 10 to 40. This makes the rings more rigid.
  • If you use a shadow contact map for nucleic acids, the stacking contacts will NOT be rescaled by a factor of 1/3, as is done when using a cut-off contact map.

Contact to dihedral energy ratio

This is the ratio of the total stabilizing energy in ALL contacts and the total stabilizing energy in ALL flexible dihedrals (i.e. not ring, improper, or fixed dihedral angles) and does not include ligand or ion contacts. This quantity is fully described elsewhere. The sum of the strengths of all contacts and all dihedrals is then normalized to the number of atoms in the system, minus the number of ligand and ion atoms.

PROTEINS ONLY: Backbone to sidechain dihedral ratio

In this model, all protein backbone dihedral angles are given identical energetic weighting. All protein side chain dihedrals are also given identical energetic weighting. This quantity sets the ratio of the strength of a single protein backbone dihedral to a single protein side chain dihedral.

RNA/DNA ONLY: Sidechain dihedral to backbone dihedral ratio

In this model, all RNA/DNA backbone dihedral angles are given identical energetic weighting. All RNA/DNA side chain dihedrals are also given identical energetic weighting. This quantity sets the ratio of the strength of a single RNA/DNA side chain dihedral and a single RNA/DNA backbone dihedral.

Relative strengths of dihedral angles

The relative strengths of backbone and sidechain dihedrals, combined with the requirement that the total stabilizing energy is equal to the number of atoms, is not sufficient to unambiguously define the energy of every dihedral angle in the system. In order to unambiguously define the values, one must specify the relative strength of protein, NA and ligand dihedral angles. The values of the relative strengths only matter if you have a system that is a mixed protein/NA/ligand system (or has any two of those). Since these are relative weights, indicating 1, 2 and 3 is equivalent to 10, 20 and 30.

Excluded volume

One important feature of the model is the size of the atoms. These two parameters determine the excluded distance between non-native pairs and the strength of the repulsive term. We used 2.5 and 0.01 in our initial papers, but subsequent studies have varied these values, therefore we leave these parameters flexible for additional investigation.

Contact interaction potential

When using SMOG to generate an SBM, there are multiple supported energy functions for the native contacts. The conventional potential energy function is a 12-6 interaction for the AA model, and a 12-10 interaction in the CA model. The second option that is currently supported is a Gaussian-style interaction. Here are some specifics on how the SMOG webtool prepares a potential for use with Gaussian contacts:
  • The native distance (r0) is calculated from the structure.
  • For the AA model, the width of the Gaussian function (σG, or σ in Lammert discussion) is set according to σG2=r02/(50ln(2)).
  • In the CA model, σG=0.5Å .
  • In AA and CA models, Lammert et al. defined the excluded volume potential as (σNC/rij)12. In the SMOG webtool, there are options for σNC and εNC, since the originally-implemented potential is of the form εNCNC/rij)12. When using the Gaussian option, the webtool absorbs the εNC into σNC, such that the excluded volume terms will be identical between the Gaussian and LJ potentials, when using identical settings on the SMOG tool. Specifically, for both functional forms, the excluded volume potential is defined as a/r12, where a=εNCNC)12. σNCGaussian (in the Lammert formulation) is then equivalent to (εNC)1/12σNC (Whitford representation).
  • The Gaussian potential requires a modified version of the source code, which is available here.

Do I need space between the system and the boundaries?

This software can recenter the system in a box with space between the system and the boundary, or not shift the coordinates from the submitted pdb file. If you want to center the system in a box, then the box will start at the origin. This is useful when using periodic boundary conditions (which is required for grid neighbor searching in GROMACS V4.0.X, but it is not necessary for V4.5.X). If you use PBCs, make sure your box is large enough for the dynamics of interest. For example, if you are going to look at folding/unfolding, make sure you have a large buffer so that the molecule will not extend out of the box and interact with its own image. Pay particular attention to this settings if you are using a CA model, where the non-bonded cut-off distance is large. Additionally, if you are running parallel simulations, the load balancing has trouble (highly reduced performance) if there is a lot of empty space, so it is a good idea to use the smallest box size that is practical.

What atoms should be included?

For each model, different atoms are expected, in order to reliably generate a SMOG model. Here is a summary of the possibilities.
  • All-atom models: When using an All-atom model, all heavy atoms are expected in the PDB file. If you have missing atoms, then the energetic ratios may not be exactly the specified values.
  • CA models: If you are using a CA model and you select "Shadow" for the contact map, all heavy atoms must be included. If all heavy atoms are not included (or, they are misnamed), then the resulting contact map will be distorted. If you are uploading your own CA contact map, then the PDB file should only have CA atoms in it. There are a number of internal consistency checks that are performed on the PDB file. Extra atoms have the potential of tricking the tool into thinking the structure is complete, when it may have missing information.
If you are not concerned with these issues, then you have the option of ignoring unrecognized atoms, by checking the box.
This webtool has been used to generate 19932 topology files since 10/21/08.
Please direct questions and comments to

Page created and maintained by Jeff Noel and Paul Whitford