I'm very unhappy about a newly built node which randomly crashes and reboots when running long jobs. More about that later, but here are the specs: FX 8350, 4x8 Gb RAM GSkill Ripjaws, ASRock FX990 Extreme3, Corsair GS700, MSI N210, ASUS NX1101 in an Antec GX700 case, running Wheezy with stock kernel (3.2.0-4 amd64).
I've tested the RAM using memtest86+ and found no errors, the rig uses a 700 W Corsair PSU which /should/ provide enough power, and I see no evidence of overheating based on a cronjob which runs every 2 minutes. Anyway, the first step in troubleshooting is finding a good way of reproducing the error reliably, and prime95 is what the windows overclockers use to stresstest.
Turns out prime95 (actually GIMPS) can run in a few different modes which tests different aspects of you system, which makes it sound like a pretty good program for my purposes.
See here for more information: http://www.mersenne.org/freesoft/
I've tested the RAM using memtest86+ and found no errors, the rig uses a 700 W Corsair PSU which /should/ provide enough power, and I see no evidence of overheating based on a cronjob which runs every 2 minutes. Anyway, the first step in troubleshooting is finding a good way of reproducing the error reliably, and prime95 is what the windows overclockers use to stresstest.
Turns out prime95 (actually GIMPS) can run in a few different modes which tests different aspects of you system, which makes it sound like a pretty good program for my purposes.
See here for more information: http://www.mersenne.org/freesoft/
mkdir ~/tmp/mprime -pAnd so on.
cd ~/tmp/mprime
wget http://www.mersenne.info/gimps/p95v279.linux64.tar.gz
tar xvf p95v279.linux64.tar.gz
./mprime
Welcome to GIMPS, the hunt for huge prime numbers. You will be asked a
few simple questions and then the program will contact the primenet server
to get some work for your computer. Good luck!
Attention OVERCLOCKERS!! Mprime has gained a reputation as a useful
stress testing tool for people that enjoy pushing their hardware to the
limit. You are more than welcome to use this software for that purpose.
Please select the stress testing choice below to avoid interfering with
the PrimeNet server. Use the Options/Torture Test menu choice for your
stress tests. Also, read the stress.txt file.
If you want to both join GIMPS and run stress tests, then Join GIMPS and
answer the questions. After the server gets some work for you, stop
mprime, then run mprime -m and choose Options/Torture Test.
Join Gimps? (Y=Yes, N=Just stress testing) (Y): N
Number of torture test threads to run (3): 2
Choose a type of torture test to run.
1 = Small FFTs (maximum FPU stress, data fits in L2 cache, RAM not tested
much).
2 = In-place large FFTs (maximum heat and power consumption, some RAM
tested).
3 = Blend (tests some of everything, lots of RAM tested).
11,12,13 = Allows you to fine tune the above three selections.
Blend is the default. NOTE: if you fail the blend test, but can pass the
small FFT test then your problem is likely bad memory or a bad memory
controller.
Type of torture test to run (3): 1
Accept the answers above? (Y): Y
[Main thread Sep 20 11:06] Starting workers.
[Worker #1 Sep 20 11:06] Worker starting
[Worker #1 Sep 20 11:06] Setting affinity to run worker on any logical CPU.
[Worker #2 Sep 20 11:06] Worker starting
[Worker #2 Sep 20 11:06] Setting affinity to run worker on any logical CPU.
[Worker #1 Sep 20 11:06] Beginning a continuous self-test to check your computer.
[Worker #1 Sep 20 11:06] Please read stress.txt. Hit ^C to end this test.
[Worker #2 Sep 20 11:06] Beginning a continuous self-test to check your computer.
[Worker #2 Sep 20 11:06] Please read stress.txt. Hit ^C to end this test.
[Worker #1 Sep 20 11:06] Test 1, 180000 Lucas-Lehmer iterations of M580673 using AMD K10 type-1 FFT length 28K, Pass1=112, Pass2=256.
[Worker #2 Sep 20 11:06] Test 1, 180000 Lucas-Lehmer iterations of M580673 using AMD K10 type-1 FFT length 28K, Pass1=112, Pass2=256.
CTRL+C
ConversionConversion EmoticonEmoticon