I started running malariacontrol on a dual Pentium III Linux box, and work units fail. |
Message boards : Unix/Linux : I started running malariacontrol on a dual Pentium III Linux box, and work units fail.
Author | Message |
---|---|
86236851 is a typical failure. |
|
ID: 21288 | Rating: 0 | rate: / | |
Try running it from the command line directly:
./openMalariaA_6.58_i686-pc-linux-gnu
It works ok on my linux x86_64 system without any special configuring. I installed it from the graphical BOINC manager. What complains that the application is not there? BOINC manager? Here's what's in my BOINC folder:
~/BOINC/projects/malariacontrol.net $ ls -l -h -G total 12M -rw-r--r-- 1 michael 20K Dec 17 10:53 autoRegressionParameters.csv -rw-r--r-- 1 michael 38K Dec 17 10:52 densities.csv -rwxr-xr-x 1 michael 12M Dec 17 10:53 openMalariaA_6.58_x86_64-pc-linux-gnu -rw-r--r-- 1 michael 139K Dec 17 10:53 scenario_29.xsd -rw-r--r-- 1 michael 29K Dec 20 16:23 wu_2899_402_943801_0_1356038234 -rw-r--r-- 1 michael 45K Dec 20 21:00 wu_2903_173_944059_0_1356054515 btw, use the Code button above to make your posting easier to read. |
|
ID: 21332 | Rating: 0 | rate: / | |
Task 149302533 Name wu_3152_517_239622_0_1356075191_1 Workunit 86541019 Created 21 Dec 2012 8:10:44 UTC Sent 21 Dec 2012 8:17:07 UTC Received 21 Dec 2012 8:52:25 UTC Server state Over Outcome Computation error Client state Compute error Exit status 73 (0x49) Computer ID 613682 Report deadline 25 Dec 2012 23:23:47 UTC Run time 4.16 CPU time 2.07 Validate state Invalid Credit 0.00 Application version openMalaria: A simulator of malaria epidemology and control (Branch B) v6.58 Stderr output <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 73 (0x49, -183) </message> <stderr_txt> Exception: numNewInfections: NaN in ../../projects/malariacontrol.net/openMalariaB_6.58_i686-pc-linux-gnu: +0x330 OM::Host::InfectionIncidenceModel::numNewInfections(OM::Host::Human const&, double) +0x55 OM::Host::Human::updateInfection(OM::Transmission::TransmissionModel*, double) +0xc5 OM::Host::Human::update(OM::Population const&, OM::Transmission::TransmissionModel*, bool) +0x12e OM::Population::update1() +0x518 OM::Simulation::start() +0x186 main() in /lib/libc.so.6: +0xdc __libc_start_main() ../../projects/malariacontrol.net/openMalariaB_6.58_i686-pc-linux-gnu [0x84c16f1] OpenMalaria: No such file or directory 03:50:04 (15553): called boinc_finish </stderr_txt> ]]> What complains that the application is not there? BOINC manager? I do not know. This is the entirety of the task description on the web site. I think this is a different work unit, but it has the same symptoms. A good one is running now and has over 6 hours on it. The failed ones all die in a few seconds. ____________ |
|
ID: 21338 | Rating: 0 | rate: / | |
The good one finished and validated. Others still fail in a few seconds.
Task 149304755 Name wu_1068_417_944371_0_1356079396_0 Workunit 86544599 Created 21 Dec 2012 8:43:24 UTC Sent 21 Dec 2012 8:52:25 UTC Received 23 Dec 2012 8:56:09 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 613682 Report deadline 25 Dec 2012 23:59:05 UTC Run time 74,503.21 CPU time 70,583.00 Validate state Valid Credit 31.69 Application version openMalaria: A simulator of malaria epidemology and control (Branch A) v6.58 Stderr output <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> sim end T/A: 409465126/155890321 02:43:12 (16620): called boinc_finish </stderr_txt> ]]> ____________ |
|
ID: 21347 | Rating: 0 | rate: / | |
If I knew I was getting bad work units, I would continue, but I got only two that run and the rest (quite a few) failed all with exit code 73 (I think): NaN. |
|
ID: 21356 | Rating: 0 | rate: / | |
If I knew I was getting bad work units, I would continue, but I got only two that run and the rest (quite a few) failed all with exit code 73 (I think): NaN. I am NOT a Linux guy but after a quick search found this that MAY help: http://forums.fedoraforum.org/showthread.php?s=e6b77f1a95df605875066338b1f9553b&p=1576639#post1576639 I think you are finding that there are just not many Linux folks who use the forums very much here. |
|
ID: 21360 | Rating: 0 | rate: / | |
I do not understand that link at all. |
|
ID: 21362 | Rating: 0 | rate: / | |
I do not understand that link at all. Sorry about that, I said I wasn't a Liunx guy! I will end by saying MERRY CHRISTMAS TO ALL!! |
|
ID: 21374 | Rating: 0 | rate: / | |
I do not understand that link at all. Looking at the input files for the pending MC tasks on my systems the wu_1068_* and wu_3152_* tasks have a totally different structure, so they probably follow completely different paths in the code. That could mean, for example, that the failed models tried to use instructions which aren't supported by the Pentium III (e.g. SSE2 or SSE3). The task names are probably significant, but as all tasks have been purged from the database for your P3 you would have to generate the list manually. You'll find the successful ones in ~/BOINC/job_log_www.malariacontrol.net.txt, but the failed ones would have to be extracted from ~/BOINC/stdoutdae.txt and ~/BOINC/stdoutdae.old. ____________ "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
|
ID: 21495 | Rating: 0 | rate: / | |
Message boards : Unix/Linux : I started running malariacontrol on a dual Pentium III Linux box, and work units fail.