For various reasons my beowulf has been dismantled and in boxes for most of the year, with only the six-core node seeing use a normal work computer.
Anyhow, here's a very unscientific test of the performance of my six-core (phenom II, 2.8 GHz, 8Gb RAM) running the nwchem code compiled in the previous post.
The speed-tests were performed by starting up mpd
mpd --ncpus=6&
and then executing with
time mpdrun -n x ./nwchem input.nw
where x is an integer signifying the number of cores
The nwchem.nw files I used was
nwchem.nw
start benzene
geometry units angstroms
C 0.100 1.396 0.000
C 1.209 0.698 0.000
C 1.209 -0.698 0.000
C 0.000 -1.396 0.000
C -1.209 -0.698 0.000
C -1.209 0.698 0.000
H 0.000 2.479 0.000
H 2.147 1.240 0.000
H 2.147 -1.240 0.000
H 0.000 -2.479 0.000
H -2.147 -1.240 0.000
H -2.147 1.240 0.000
end
basis
H library sto-3g
c library sto-3g
end
dft
xc b3lyp
end
task dft optimize
Here are the results:
(x is number of cores; times in seconds)
x Run 1 Run 2 Run 3 Run 4 Run 5
1* 40.8 37.9 40.7 40.3 39.9
1 22.2 40.7 40.6 44.8 38.2
2 22.8 22.4 16.3 23.5 21.5
3 14.1 12.3 15.7 15.5 15.1
4 14.5 11.5 12.0 14.9 14.7
5 11.4 11.5 8.9 11.9 12.5
6 16.0 12.2 13.4 9.9 9.6
* No mpd running; executed using time nwchem nwchem.nw
So here's the unscientific part -- the computer is running a full desktop environment with evolution, chrome etc open in the background so that each run sees a slightly different system. I've tried to vary the order in which the runs were made though.
A guess would be that a longer run would yield more reproducible results. As it is now, the length of the runs vary significantly. The only lesson that can be obtained is that it doesn't help much throwing more cores at a problem as the optimisation times only drop off slowly past a certain point.
Edit: I've run the same file using an almost identical set-up on two more boxes
Don't compare the benchmarks when running at maximum numbers of cpu, since this will be heavily affected by other processes.
Optiplex 990 (Intel i5 2400, 4 cores @ 3.1 GHz, 8 Gb RAM)
x Run 1 Run 2 Run 3 Run 4 Run 5
1 45.80 46.97 46.56 46.95 39.01
2 22.77 25.81 26.93 26.61 25.81
3 17.18 16.48 18.89 19.26 19.18
4 11.62 16.62 15.82 15.86 16.03
Homebuilt (3 core AMD Athlon 2 X3 @ 3.1 GHz, 4 Gb RAM)
x Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
1 43.74 57.02 40.22 47.89 53.87
2 31.41 22.31 25.83 32.31 33.00
3 36.19 31.01 43.55 24.75 37.82 33.95 27.06
Anyhow, here's a very unscientific test of the performance of my six-core (phenom II, 2.8 GHz, 8Gb RAM) running the nwchem code compiled in the previous post.
The speed-tests were performed by starting up mpd
mpd --ncpus=6&
and then executing with
time mpdrun -n x ./nwchem input.nw
where x is an integer signifying the number of cores
The nwchem.nw files I used was
nwchem.nw
start benzene
geometry units angstroms
C 0.100 1.396 0.000
C 1.209 0.698 0.000
C 1.209 -0.698 0.000
C 0.000 -1.396 0.000
C -1.209 -0.698 0.000
C -1.209 0.698 0.000
H 0.000 2.479 0.000
H 2.147 1.240 0.000
H 2.147 -1.240 0.000
H 0.000 -2.479 0.000
H -2.147 -1.240 0.000
H -2.147 1.240 0.000
end
basis
H library sto-3g
c library sto-3g
end
dft
xc b3lyp
end
task dft optimize
Here are the results:
(x is number of cores; times in seconds)
x Run 1 Run 2 Run 3 Run 4 Run 5
1* 40.8 37.9 40.7 40.3 39.9
1 22.2 40.7 40.6 44.8 38.2
2 22.8 22.4 16.3 23.5 21.5
3 14.1 12.3 15.7 15.5 15.1
4 14.5 11.5 12.0 14.9 14.7
5 11.4 11.5 8.9 11.9 12.5
6 16.0 12.2 13.4 9.9 9.6
* No mpd running; executed using time nwchem nwchem.nw
So here's the unscientific part -- the computer is running a full desktop environment with evolution, chrome etc open in the background so that each run sees a slightly different system. I've tried to vary the order in which the runs were made though.
A guess would be that a longer run would yield more reproducible results. As it is now, the length of the runs vary significantly. The only lesson that can be obtained is that it doesn't help much throwing more cores at a problem as the optimisation times only drop off slowly past a certain point.
Edit: I've run the same file using an almost identical set-up on two more boxes
Don't compare the benchmarks when running at maximum numbers of cpu, since this will be heavily affected by other processes.
Optiplex 990 (Intel i5 2400, 4 cores @ 3.1 GHz, 8 Gb RAM)
x Run 1 Run 2 Run 3 Run 4 Run 5
1 45.80 46.97 46.56 46.95 39.01
2 22.77 25.81 26.93 26.61 25.81
3 17.18 16.48 18.89 19.26 19.18
4 11.62 16.62 15.82 15.86 16.03
Homebuilt (3 core AMD Athlon 2 X3 @ 3.1 GHz, 4 Gb RAM)
x Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
1 43.74 57.02 40.22 47.89 53.87
2 31.41 22.31 25.83 32.31 33.00
3 36.19 31.01 43.55 24.75 37.82 33.95 27.06
ConversionConversion EmoticonEmoticon