118. Solution to nwchem: SHMMAX too small

Update: also see this post: http://verahill.blogspot.com.au/2012/10/shmmax-revisited-and-shmall-shmmni.html

When running nwchem using mpirun I've occasionally encountered this error.

Error:
******************* ARMCI INFO ************************
The application attempted to allocate a shared memory segment of 44498944 bytes in size. This might be in addition to segments that were allocated succesfully previously. The current system configuration does not allow enough shared memory to be allocated to the application.

This is most often caused by:
1) system parameter SHMMAX (largest shared memory segment) being too small or
2) insufficient swap space.
Please ask your system administrator to verify if SHMMAX matches the amount of memory needed by your application and the system has sufficient amount of swap space. Most UNIX systems can be easily reconfigured to allow larger shared memory segments,
see http://www.emsl.pnl.gov/docs/global/support.html
In some cases, the problem might be caused by insufficient swap space.
*******************************************************
0:allocate: failed to create shared region : -1
(rank:0 hostname:boron pid:17222):ARMCI DASSERT fail. shmem.c:armci_allocate():1082 cond:0

Diagnosis:
Check the currently defined shmmax:
cat /proc/sys/kernel/shmmax
33554432
Well, 33554432<44498944, so it seems that it's caused by reason 1 above.

Solution:

Edit /etc/sysctl.conf
Add a line saying
kernel.shmmax=44498944
Save and reboot. The exact value is up to you -- I've set my shmmax to 128*1024*1024=134217728, while our production cluster has 6269961216.

Update: to change it on the fly do
sudo sysctl -w kernel.shmmax=6269961216
Previous
Next Post »