53. GROMACS -- carbon dioxide in water. Example

I'm new to GROMACS, so don't follow what I've done blindly.

To get started you need a .top and a .gro file. The .gro file is not dissimilar to a .pdb file and contains xyz coordinates and atom names. The .gro format is inflxible in that specific numbers of chars are dedicated to each field -- if that's Greek to you, then take this advice: don't edit it by hand. If you do, only substitute x number of characters with the same number of characters.

The .top file contains information about the atoms in each molecule and defines their properties, including partial charges, Lennar-Jones or Buckingham params, constraints etc.

 Making the .top file is probably the most challenging step when you are new to GROMACS. Once you've done it a few times it becomes easy, although perhaps somewhat tedious at times. Using a zmat file as your starting point may help (we'll do that here)

For each 'experiment' you need an .mdp file. The .mdp file defines what methods are used during the run, the type of experiment, cutoffs, etc.

In the example I use the gromacs binaries I compiled in an earlier post. To avoid confusion I'll use the double-precision (_dd) binaries the entire time. If you are using the debian gromacs binaries, use _d instead.

To start you need co2.gro, step1.top, em.mdp, eq.mdp, eq2.mdp and production.mdp. The rest of the files are generated

Comments are added to gromacs files by adding a ; in front.

Summary of commands that we'll be using after we have our .gro and .top files:

genbox_dd -cp co2.gro -o step1.gro -cs /usr/local/gromacs/share/gromacs/top/spc216.gro -p step1.top
# energy minimisation
grompp_dd -f em.mdp -po step3.mdp -p step1.top -pp step2.top -c step1.gro -o step3.tpr
mdrun_dd -v -s step3.tpr -o step4.trr -x step4.xtc -cpo step4.cpt -c step4.gro -e step4.edr -g step4.log
# equilibration step 1
grompp_dd -f eq.mdp -po step6.mdp -p step2.top -pp step6.top -c step4.gro -o step6.tpr
mdrun_dd -v -s step6.tpr -o step7.trr -x step7.xtc -cpo step7.cpt -c step7.gro -e step7.edr -g step7.log
# equilibration step 2
grompp_dd -f eq2.mdp -po step8.mdp -p step6.top -pp step8.top -c step7.gro -o step8.tpr
mdrun_dd -v -s step8.tpr -o step9.trr -x step9.xtc -cpo step9.cpt -c step9.gro -e step9.edr -g step9.log
# production run
grompp_dd -f production.mdp -po step10.mdp -p step8.top -pp step10.top -c step9.gro -o step10.tpr
mdrun_dd -v -s step10.tpr -o step11.trr -x step11.xtc -cpo step11.cpt -c step11.gro -e step11.edr -g step11.log 


START HERE
If at any point you get annoying fatal errors, your first thought should be 'formatting'. I don't know how faithful blogspot is towards tabs etc. Google -- if you're listening: please allow the upload of simple ascii files! Oh, and while we're at it -- if I'm emailing text files with unix line endings, don't effing change them to windows line endings! Ehum...so...


1. Making a .gro file
I first made an .xyz file using avogadro and optimised it using the built-in MM engine.

co2.xyz:

3

C         -5.06401        3.32301        1.92535
O         -4.89871        1.97004        1.81668
O         -5.22923        4.67528        2.03397


I edited the file to
1. Add a title card (not important)
2. give the atoms specific names (the names aren't important -- but they should match those used in .top). I decided to call the carbon CO and the Oxygens OC since the C is bonded to O and the Os are bonded to C. The names must be 1-5 characters long.

co2_edited.xyz:

3
CAR
CO         -5.06401        3.32301        1.92535
OC         -4.89871        1.97004        1.81668
OC         -5.22923        4.67528        2.03397

Next, we make our .gro file and set the box size to 2 by 2 by 2 nanometres:
editconf_dd -f co2_edited.xyz -o co2.gro -box 2 2 2 -label CAR -resnr 1
co2.gro:

CAR
    3
    1 ???    CO    1   1.000   1.000   1.000
    1 ???    OC    2   1.017   0.865   0.989
    1 ???    OC    3   0.983   1.135   1.011
   2.00000   2.00000   2.00000
I opened co2.gro in vim and did 
:%s/???/CAR/g
the saved using 
:wq

co2.gro:
CAR
    3
    1 CAR    CO    1   1.000   1.000   1.000
    1 CAR    OC    2   1.017   0.865   0.989
    1 CAR    OC    3   0.983   1.135   1.011
   2.00000   2.00000   2.00000

This is our starting .gro file.


2. Making our .top file
I first made a zmat file since it contains angles and bond lengths. CO2 is a simple enough molecule that this isn't necessary, but it's good to have a standard approach.

If you haven't yet installed babel, then do so:
sudo apt-get install openbabel

Generate the zmat file:
babel co2.xyz co2.gzmat

co2.gzmat:
 CAR
0  1
Co
Xx  1  r2
Xx  1  r3  2  a3
Variables:
r2= 1.3674
r3= 1.3666
a3= 179.97

For an explanation of the zmat format, look here.

OK, time to create our .top file

Open a new file and call it step1.top

First create directive headers to outline the file -- we will have to types of molecules -- CAR (CO2) -- and SOL (water), so we have two sets of [molecultypes] (+ [atoms], [constraints] etc.):
[defaults]
[atomtypes]
[moleculetype]
[atoms]
[constraints]
[exclusions]
[angles]
[dihedrals]
[moleculetype]
[atoms]
[constraints]
[exclusions]
[system]
[molecules]


The order of the different directive is (sadly) important. 

OK, time to add information to the different sections -- let's start with the easy ones:

[defaults]:
[defaults]
; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ
1 1 yes 1.0 1.0

This is a standard directive when you roll your own simulation independent of predefined forcefields. nbfunc=1 means we're using Lennard-Jones c6/c12 parameters.

[system]:
[system]
sparkling water

[system] is basically like a title-card

[molecules]:
[molecules]
CAR      1
SOL      1

OK, time to define our carbon dioxide molecule:
[moleculetype]:
[moleculetype]
;name nrexcl
CAR    3
CAR is the name and nrexcl=3 tells gromacs to exclude non-bonded interactions between atoms that are no further than 3 bonds away.

For atoms the order should be the same as in the .gro file. Here CO  and the two OC make up a single charge group (all have the same cgnr=1). I got the partial charges from an article -- http://pubs.acs.org/doi/full/10.1021/jp062723w.

The type (e.g. CO) must match that which will later be defined in [atomtypes]. atom (e..g CO) must match that in the .gro file. mass is self-explanatory, as is nr.

[atoms]
[atoms]
;   nr   type  resnr residue  atom   cgnr     charge       mass
1 CO 1 CAR CO 1 0.70         12.0107
2 OC 1 CAR OC 1 -0.35 15.9994
3 OC 1 CAR OC 1 -0.35 15.9994

We will constrain the bond distances -- which we got from the zmat file above (r2 and r3 -- I picked 1.3674 Ångström, which is 0.13674 nm:

[constraints]:
; particles bonded if defined in bonds (1, 5, 6) or constraints
[constraints]
;ai     aj func bond
1 2 1 0.13674
1 3 1 0.13674
[exclusions]:
 [exclusions]
;ai other atoms
1 2 3
2 1 3
3 1 2
Exclusions excludes non-bonded interactions between atom ai and the other atoms listed on the same line. It's probably not necessary in such a small molecule.

[angles]:

[angles]
2 1 3   1 180 1500

There's only one angle -- atom 1 is carbon, so the angle is across O=C=O, or 2=1=3 or, if you prefer, 3=1=2 -- look at atom nr. We know that the angle is 180 degrees, but the zmat data also gave us that. Again, useful for more complex molecules.

Comment out dihedral, since there aren't any. You should be aware of the existence of it though.
[dihedrals]:
;[dihedrals]

Water:
We do the same thing for water, just without explanations. This was copied verbatim from the gromacs manual. Look there for information about [settles]

[ moleculetype ]
; molname nrexcl
SOL 2
[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge       mass
     1     OW      1    SOL     OW      1    -0.8476   15.99940
     2     HW      1    SOL    HW1      1     0.4238    1.00800
     3     HW      1    SOL    HW2      1     0.4238    1.00800
;[ constraints ]
; ai aj funct length(b0, nm) kb(kJ mol-1 nm-2)
;  1 2 1 0.100 ;  1 3 1 0.100 ;  2 3 1 0.1633
[ settles ]
;ai funct doh dhh
 1 1 0.1 0.1633
[ exclusions ]
1       2       3
2       1       3
3       1       2

We've so far avoided [atomtypes], which is probably the MOST IMPORTANT directive.

[atomtypes]:
[atomtypes]
;type z       mass        pq          ptype    c6                 c12
OC 8 15.9994 -0.35 A 2.5438e-1 2.0478e-4
CO 6 12.0107 0.70 A 5.2044e-2 2.5080e-5
OW 8 15.9994 -0.834 A 0.261710e-2 0.26331e-05
HW 1 1.0080 0.417 A 0.0000000000 0.00000e-00

So how did we get here?
Let's look at the first line:

OC is the atom type which we'll refer to under the [atoms] directives.
8 is the atomic number of oxygen.
15.9994  is the average atomic mass of carbon.
-0.35 is the partial charge. For carbon CO and oxygen OC I got it from http://pubs.acs.org/doi/full/10.1021/jp062723w. However, in the following post from 2011: "The charges from [atoms] are used. The charges in [atomtypes] are never used in any forcefield currently used with GROMACS."
A is the particle type -- A stands for Atom
2.5438e-1 is the Lennard-Jones c6 parameter for the carbon dioxide carbon -- I got it from here which gives epsilon and sigma: c6=4*epsilon*sigma^6
2.0478e-4 is the Lennard-Jones c12 parameter for the carbon dioxide carbon -- I got it from here which gives epsilon and sigma: c12=4*epsilon*sigma^12. (sigma_o=0.305 nm; epsilon_o=79 K; sigma_c=0.280 nm; epsilon_c=27 K)

I got OW and HW from a different source.

Finally, here's our entire step1.top:

[defaults]
1 1 yes 1.0 1.0
;http://pubs.acs.org/doi/full/10.1021/jp062723w
;
[atomtypes]
; sigma_o=0.305 nm; epsilon_o=79 K; sigma_c=0.280; epsilon_c=27
OC 8 15.9994 -0.35 A 2.5438e-1 2.0478e-4
CO 6 12.0107 0.70 A 5.2044e-2 2.5080e-5 OW 8 15.9994 -0.8476 A   0.261710e-2  0.26331e-05
HW 1 1.0080 0.4238 A 0.0000000000 0.00000e-00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; [moleculetype]
CAR 3
[atoms]
1 CO 1 CAR CO 1 0.70 12.0107
2 OC 1 CAR OC 1 -0.35 15.9994
3 OC 1 CAR OC 1 -0.35 15.9994
; particles bonded if defined in bonds (1, 5, 6) or constraints
[constraints]
; func bond
1 2 1 0.13674
1 3 1 0.13674
;all non-bonded interactions between atom 1 and the other atoms ignored
[exclusions]
1 2 3
2 1 3
3 1 2
[angles]
2 1 3   1 180 1500
;[dihedrals]
;no dihedrals
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[ moleculetype ]
; molname nrexcl
SOL 2
[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge       mass
     1     OW      1    SOL     OW      1    -0.8476   15.99940
     2     HW      1    SOL    HW1      1     0.4238    1.00800
     3     HW      1    SOL    HW2      1     0.4238    1.00800
;[ constraints ]
; ai aj funct length(b0, nm) kb(kJ mol-1 nm-2)
;  1 2 1 0.100 ;  1 3 1 0.100 ;  2 3 1 0.1633
[ settles ]
;ai funct doh dhh
 1 1 0.1 0.1633
[ exclusions ]
1       2       3
2       1       3
3       1       2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[system]
fizzy water
[molecules]
CAR 1
SOL               212

3. Add water
We use co2.gro and step1.top as the input. We get our water molecules from /usr/local/gromacs/share/gromacs/top/spc216.gro. We write a new .gro file, step1.gro. It will have our carbon dioxide molecule (CAR) and 212 water molecules (SOL).

genbox_dd -cp co2.gro -o step1.gro -cs /usr/local/gromacs/share/gromacs/top/spc216.gro -p step1.top

Ideally genbox_dd updates step1.top and puts the correct number of SOL molecules under [molecules], but please check.

[..]
Successfully made neighbourlist
nri = 11519, nrj = 660245
Checking Protein-Solvent overlap: tested 92 pairs, removed 9 atoms.
Checking Solvent-Solvent overlap: tested 13414 pairs, removed 540 atoms.
Added 212 molecules
Generated solvent containing 636 atoms in 212 residues
Writing generated configuration to step1.gro
Back Off! I just backed up step1.gro to ./#step1.gro.1#
CAR
Output configuration contains 639 atoms in 213 residues
Volume                 :           8 (nm^3)
Density                :      811.63 (g/l)
Number of SOL molecules:    212
Processing topology
[..]
4. Do energy minimisation (EM):
We first need our em.mdp to tell mdrun what to do. Like all the other .mdp files they originally come from http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/01_pdb2gmx.html, and have been used with a minimal amount of editing.

em.mdp:

; minim.mdp - used as input into grompp to generate em.tpr
; Parameters describing what to do, when to stop and what to save
integrator = steep ; Algorithm (steep = steepest descent minimization)
emtol = 1000.0   ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep          = 0.01               ; Energy step size
nsteps = 50000   ; Maximum number of (minimization) steps to perform
; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist = 1         ; Frequency to update the neighbor list and long range forces
ns_type         = grid ; Method to determine neighbor list (simple, grid)
rlist              = 0.9         ; Cut-off for making neighbor list (short range forces)
coulombtype = PME ; Treatment of long range electrostatic interactions
rcoulomb = 0.9 ; Short-range electrostatic cut-off
rvdw = 0.9 ; Short-range Van der Waals cut-off
pbc          =  xyz         ; Periodic Boundary Conditions (yes/no)
Alright. Let's make our binary .trp file using our step1.top and our step1.gro. We'll use the binary trp file with mdrun in the next step. grompp also output a new top file, step2.top, for us, as well as a new .mdp file, step3.mdp.The latter doesn't matter to us.

grompp_dd -f em.mdp -po step3.mdp -p step1.top -pp step2.top -c step1.gro -o step3.tpr
mdrun_dd -v -s step3.tpr -o step4.trr -x step4.xtc -cpo step4.cpt -c step4.gro -e step4.edr -g step4.log

This step is fast, and we get
Steepest Descents converged to Fmax < 1000 in 36 steps
Potential Energy  = -9.52113394667855e+03
Maximum force     =  5.44327588453194e+02 on atom 1
Norm of force     =  7.44126862763805e+01

5. Equilibration part 1
title = fizzy drink
define = -DPOSRES ; position restrain the protein
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 90000 ; 2 * 50000 = 100 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 100 ; save coordinates every 0.2 ps
nstvout = 100 ; save velocities every 0.2 ps
; ; no nstxtcout
nstenergy = 100 ; save energies every 0.2 ps
nstlog = 100 ; update log file every 0.2 ps
; Bond parameters
continuation = no ; first dynamics run
constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
ns_type = grid ; search neighboring grid cells
nstlist = 5 ; 10 fs
rlist = 0.9 ; short-range neighborlist cutoff (in nm)
rcoulomb = 0.9 ; short-range electrostatic cutoff (in nm)
rvdw = 0.9 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = CAR SOL ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
;;;;;;;;;;;;;;;;;;;;;
; no pressure coupling
;;;;;;;;;;;;;;;;;;;;;;;;;
; Pressure coupling is off
pcoupl = no ; no pressure coupling in NVT
;
;
;
;
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = yes ; assign velocities from Maxwell distribution
gen_temp = 300 ; temperature for Maxwell distribution
gen_seed = -1 ; generate a random seed
We then run:
grompp_dd -f eq.mdp -po step6.mdp -p step2.top -pp step6.top -c step4.gro -o step6.tpr
mdrun -v -s step6.tpr -o step7.trr -x step7.xtc -cpo step7.cpt -c step7.gro -e step7.edr -g step7.log

which gives
step 90000, remaining runtime:     0 s        
 Average load imbalance: 2.7 %
 Part of the total run time spent waiting due to load imbalance: 1.0 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %

Parallel run - timing based on wallclock.
               NODE (s)   Real (s)      (%)
       Time:     27.299     27.299    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    258.447     15.110    569.700      0.042


See this tutorial for an explanation of the two-step equilibration approach that we're using. I use 90000 steps instead of 10000, and it really doesn't matter at this stage.

6. Equilibration part 2
eq2.mdp:
title = fizzy drink
define = -DPOSRES ; position restrain the protein
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 90000 ; 2 * 50000 = 100 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 100 ; save coordinates every 0.2 ps
nstvout = 100 ; save velocities every 0.2 ps
; ; no nstxtcout
nstenergy = 100 ; save energies every 0.2 ps
nstlog = 100 ; update log file every 0.2 ps
; Bond parameters
continuation = yes ; <---  Restarting after NVT
constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
ns_type = grid ; search neighboring grid cells
nstlist = 5 ; 10 fs
rlist = 0.9 ; short-range neighborlist cutoff (in nm)
rcoulomb = 0.9 ; short-range electrostatic cutoff (in nm)
rvdw = 0.9 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = CAR SOL ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; turning on pressure coupling
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; <-- Pressure coupling is on
pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; <-- Velocity generation is off
;
;


and we run
grompp_dd -f eq2.mdp -po step8.mdp -p step6.top -pp step8.top -c step7.gro -o step8.tpr
mdrun_dd -v -s step8.tpr -o step9.trr -x step9.xtc -cpo step9.cpt -c step9.gro -e step9.edr -g step9.lo

The mdrun takes 43 seconds on an intel i5 with four cores.

7. Production run
production.mdp:
title = fizzy water
; removed define
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 500000 ; 2 * 500000 = 1000 ps, 1 ns
dt = 0.002 ; 2 fs
; Output control
nstxout = 1000 ; save coordinates every 2 ps. Incr by *10
nstvout = 1000 ; save velocities every 2 ps. Incr by *10
nstxtcout = 1000 ; xtc compressed trajectory output every 2 ps. Incr by *10
nstenergy = 1000 ; save energies every 2 ps. Incr by *10
nstlog = 1000 ; update log file every 2 ps. Incr by *10
; Bond parameters
continuation = yes ; <-- Restarting after NPT
constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
ns_type = grid ; search neighboring grid cells
nstlist = 5 ; 10 fs
rlist = 0.9 ; short-range neighborlist cutoff (in nm)
rcoulomb = 0.9 ; short-range electrostatic cutoff (in nm)
rvdw = 0.9 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = CAR SOL ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; turning on pressure coupling
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; <-- Pressure coupling is on
pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; <-- Velocity generation is off
;
;

and we run
grompp_dd -f production.mdp -po step10.mdp -p step8.top -pp step10.top -c step9.gro -o step10.tpr
mdrun_dd -v -s step10.tpr -o step11.trr -x step11.xtc -cpo step11.cpt -c step11.gro -e step11.edr -g step11.log 

The last run takes four minutes on four cores.

8. Analysis
Again, this follows  this tutorial almost verbatim:
trjconv -s step10.tpr -f step11.xtc -o step12.xtc -pbc mol 



RMS during production run for CAR/SOL:
g_rms -s step10.tpr -f step12.xtc -o step13_rms.xvg -tu ns




CAR
g_gyrate -s step10.tpr -f step12.xtc -o step14_gyrate.xvg

CAR/SOL:
g_rdf -s step10.tpr -f step11.xtc -o step15_rdf.xvg 

And the video has as usual been compressed to something ugly and useless by blogger:


Previous
Next Post »