Testing openMalaria app version 6.30/6.31/6.32/6.33

Message boards : Number crunching : Testing openMalaria app version 6.30/6.31/6.32/6.33

Author Message
Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

We're about to start testing of the new application release, version 6.30. Diggory's outlined a few of the new features that we added for this one
in this thread.

We'll start slowly, over the next few days it's likely that there will not always be workunits available. This will allow us to look at the results we get back and make adjustments if necessary. Please note that another reason you may not get work during testing is that this works on an opt-in basis, you may need to change your preferences if you want to help testing.
Thanks
Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4382
Credit: 5,361,193
RAC: 1,084

We're about to start testing of the new application release, version 6.30. Diggory's outlined a few of the new features that we added for this one
in this thread.

We'll start slowly, over the next few days it's likely that there will not always be workunits available. This will allow us to look at the results we get back and make adjustments if necessary. Please note that another reason you may not get work during testing is that this works on an opt-in basis, you may need to change your preferences if you want to help testing.
Thanks
Nick


And I think we are ready to start working on those units and trying to help you make the World a better place. Man that sounds lofty and funky, what I mean to say is, we are ready to start crunching again whenever you are.

John Clark
Avatar
Send message
Joined: Feb 10 08
Posts: 2149
Credit: 1,193,234
RAC: 1,472

I am sure Mikey is correct, in that we have activated all the appropriate preferences ... placed our 12 hour update scripts (tongue in cheek) and wait for the new trial work.
____________
Go away, I was asleep

Said a Russell, 3 Shih-Tzus & a Bischeon Frize

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

We've updated the openMalariaBeta application to version 6.31 after we found that the Linux and Mac versions of 6.30 would crash on some of the computers which received work from the first mini-batch.
Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

We're working on a new release (6.33). There were some minor issues with checkpointing in the previous versions, and a major problem with library incompatibilities with the Linux versions.
We'll be back with an update shortly.
Thanks
Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

John Clark
Avatar
Send message
Joined: Feb 10 08
Posts: 2149
Credit: 1,193,234
RAC: 1,472

Looking forwards to the test work or the propper run soon
____________
Go away, I was asleep

Said a Russell, 3 Shih-Tzus & a Bischeon Frize

BarryAZ
Send message
Joined: Apr 26 08
Posts: 36
Credit: 8,128,972
RAC: 3,503


Nick, good news.

By the way, you might look to having the homepage updated regarding status -- the last note there is from March 4.

We're working on a new release (6.33). There were some minor issues with checkpointing in the previous versions, and a major problem with library incompatibilities with the Linux versions.
We'll be back with an update shortly.
Thanks
Nick


____________

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

We've updated the openMalariaBeta Windows and Mac versions to 6.33 today, and are currently sending out a small test batch targeted at these operating systems. 6.33 differs from 6.32 in that we have added some more log messages, which should help us tackle the problems with checkpointing.

We have not released Linux versions 6.33 so far. The reason is that we've run into a problem with the Linux build: Our application relies on a number of software libraries developed by others. We use these libraries to help us with tasks like the parsing of the input files, sampling of random numbers, or compressing the result files before sending them back to the server. We now face the problem that some of these libraries require us to use a recent Linux distribution to build the application - in the past we always used fairly dated distributions to prevent problems when the application was run on a computer with an older Linux. We're now trying to get a running Linux version that will support as wide a range of hosts as possible.

We've configured our BOINC server so that it should send work only to computers which can run version 6.33, this way we wanted to prevent workunits being sent to Linux hosts until we can do something about the library dependency problems.
It seems this is not working perfectly - some Linux hosts get work even though they can only use version 6.32 (and therefore all workunits crash), and others get misleading messages from the scheduler (like "not enough disk space"). Not sure why this is, but we're looking into it.

Could be that older BOINC clients don't report the application version when requesting work?

Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

u.dgl.
Send message
Joined: Mar 8 06
Posts: 26
Credit: 1,170,758
RAC: 538

Hi,
there is still a lot do also with windows hosts!

Look here:

823 19 May 2010 3:07:43 UTC 19 May 2010 5:36:17 UTC Error while computing 0.00 8,541.27 28.21 --- openMalaria test version v6.33
769 18 May 2010 20:23:58 UTC 19 May 2010 3:38:25 UTC Completed, waiting for validation 0.00 6,761.38 22.33 pending openMalaria test version v6.33
804 18 May 2010 19:11:37 UTC 18 May 2010 22:11:36 UTC Completed, waiting for validation 8,004.24 7,933.45 26.13 pending openMalaria test version v6.33
569 18 May 2010 19:01:04 UTC 19 May 2010 3:02:07 UTC Completed, can't validate 0.00 4,045.80 13.36 0.00 openMalaria test version v6.33
877 18 May 2010 18:32:15 UTC 19 May 2010 3:02:07 UTC Completed, marked as invalid 0.00 4,012.22 13.25 0.00 openMalaria test version v6.33
799 18 May 2010 17:49:56 UTC 18 May 2010 21:17:53 UTC Completed and validated 7,792.50 7,640.91 25.16 28.80 openMalaria test version v6.33
593 18 May 2010 17:24:32 UTC 18 May 2010 19:48:00 UTC Completed, waiting for validation 4,567.73 4,501.00 14.82 pending openMalaria test version v6.33
603 18 May 2010 17:07:43 UTC 18 May 2010 23:09:52 UTC Error while computing 0.00 3,639.33 12.02 --- openMalaria test version v6.33
638 18 May 2010 16:54:15 UTC 18 May 2010 19:02:37 UTC Completed and validated 7,611.43 7,434.18 24.48 26.14 openMalaria test version v6.33
630 18 May 2010 16:54:36 UTC 18 May 2010 18:39:54 UTC Completed, waiting for validation 4,757.64 4,599.14 15.15 pending openMalaria test version v6.33
455 18 May 2010 15:39:03 UTC 18 May 2010 21:01:40 UTC Completed, waiting for validation 0.00 4,059.99 13.41 pending openMalaria test version v6.33
453 18 May 2010 15:39:03 UTC 18 May 2010 19:57:43 UTC Completed, waiting for validation 0.00 4,248.89 14.03 pending openMalaria test version v6.33
451 18 May 2010 15:39:23 UTC 18 May 2010 23:09:52 UTC Completed, waiting for validation 0.00 3,995.75 13.20 pending openMalaria test version v6.33
450 18 May 2010 15:38:42 UTC 18 May 2010 18:59:59 UTC Completed, marked as invalid 0.00 4,417.19 14.59 0.00 openMalaria test version v6.33
435 18 May 2010 15:38:42 UTC 18 May 2010 18:30:07 UTC Error while computing 0.00 3,622.67 11.96 --- openMalaria test version v6.33

____________

Profile Saenger
Avatar
Send message
Joined: Mar 8 06
Posts: 55
Credit: 143,384
RAC: 28

I've got Linux and got 2 WUs, that both crashed with the following error:

6.10.17

process exited with code 255 (0xff, -1)


Cannot create, lock or unlock a mutex


]]>


This are the lines from the BOINC manager:
Mi 19 Mai 2010 08:35:50 CEST malariacontrol.net Starting wu8_16300_1274199106_1
Mi 19 Mai 2010 08:35:50 CEST malariacontrol.net [cpu_sched] Starting wu8_16300_1274199106_1 (initial)
Mi 19 Mai 2010 08:35:50 CEST malariacontrol.net Starting task wu8_16300_1274199106_1 using openMalariaBeta version 632
Mi 19 Mai 2010 08:35:51 CEST malariacontrol.net Computation for task wu8_16300_1274199106_1 finished
Mi 19 Mai 2010 08:35:51 CEST malariacontrol.net Output file wu8_16300_1274199106_1_0 for task wu8_16300_1274199106_1 absent
Mi 19 Mai 2010 08:35:51 CEST malariacontrol.net Output file wu8_16300_1274199106_1_1 for task wu8_16300_1274199106_1 absent
Mi 19 Mai 2010 08:35:51 CEST malariacontrol.net Starting wu8_16086_1274199106_1
Mi 19 Mai 2010 08:35:51 CEST malariacontrol.net [cpu_sched] Starting wu8_16086_1274199106_1 (initial)
Mi 19 Mai 2010 08:35:51 CEST malariacontrol.net Starting task wu8_16086_1274199106_1 using openMalariaBeta version 632
Mi 19 Mai 2010 08:35:52 CEST malariacontrol.net Computation for task wu8_16086_1274199106_1 finished
Mi 19 Mai 2010 08:35:52 CEST malariacontrol.net Output file wu8_16086_1274199106_1_0 for task wu8_16086_1274199106_1 absent
Mi 19 Mai 2010 08:35:52 CEST malariacontrol.net Output file wu8_16086_1274199106_1_1 for task wu8_16086_1274199106_1 absent
Mi 19 Mai 2010 08:36:57 CEST malariacontrol.net Sending scheduler request: To fetch work.
Mi 19 Mai 2010 08:36:57 CEST malariacontrol.net Reporting 2 completed tasks, requesting new tasks
Mi 19 Mai 2010 08:37:02 CEST malariacontrol.net Scheduler request completed: got 0 new tasks
Mi 19 Mai 2010 08:37:02 CEST malariacontrol.net Message from server: (Project has no jobs available)

____________
Grüße vom Sänger

Profile Francis Butts
Avatar
Send message
Joined: Oct 26 07
Posts: 2
Credit: 146,906
RAC: 0

I'm not sure if this trouble report should go here or not. But I got this response to the latest download. Which was tagged "Text Version V6.34
Task 13250198

Name vip_multi_run12_1274606128_993903_6
Workunit 6475121
Created 24 May 2010 14:13:57 UTC
Sent 24 May 2010 14:16:05 UTC
Received 24 May 2010 14:57:15 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1 (0xffffffffffffffff)
Computer ID 2720
Report deadline 27 May 2010 14:16:05 UTC
Run time 0
CPU time 0
stderr out

6.10.18

- exit code -1 (0xffffffff)


Mon May 24 09:51:46 2010
!ERROR_-1! Error in : Wrong file format


]]>

Validate state Invalid
Claimed credit 0
Granted credit 0
application version Application simulating the growth of a prairie of many clonal plants v0.10
Return to Virtual Prairie main page
Copyright � 2010 University of Houston
[/b]
____________




Profile Saenger
Avatar
Send message
Joined: Mar 8 06
Posts: 55
Credit: 143,384
RAC: 28

Here is another example of a failed w/u; Test Version V6.33. Shows "error while computing."
Task 57048055

It's a WU that says MIE!
____________
Grüße vom Sänger

hardy
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: Feb 18 09
Posts: 141
Credit: 54,376
RAC: 129

Yes. Thankfully we've reduced the number of times it says that now! MIE isn't the reason it crashed anyway, just a diagnostic.

Unfortunately, 6.32 was totally broken on linux. 6.35 appears to be better... except on some (lots of?) 32-bit linux systems.

talister
Send message
Joined: Mar 24 07
Posts: 1
Credit: 528,234
RAC: 122

My earlier glibc 64bit Linux systems (RHEL5 Update 3 and RHEL4 Update 8) that insta-crashed v6.35 of the test app seem to be happily running the newest v6.36 test app. so far.

EDIT:Slightly premature, the RHEL4 Update 8 system just crashed it with:


6.6.41

process exited with code 1 (0x1, -255)


../../projects/malariacontrol.net/openMalariaBeta_6.36_i686-pc-linux-gnu: /lib/tls/libc.so.6: version `GLIBC_2.4' not found (required by ../../projects/malariacontrol.net/openMalariaBeta_6.36_i686-pc-linux-gnu)


]]>


(That system has glibc 2.3.4)

Bill
Send message
Joined: Nov 1 09
Posts: 3
Credit: 3,625
RAC: 0

1. could someone please delete all those MIE lines on an earlier post it does not make this thread very user friendly.

2. Problem with v6.35 on W XP Home.
I have found that each time a checkpoint is written it is 50MB in size.
most other apps in Boinc write a few kb.
The results are every time a checkpoint is written the CPU usage drops to less than 40% for 5 seconds and even foreground apps are sluggish as 6.35 does not back off during the checkpoint process.
Also the noise generated by my hard disk during this write is intrusive (I normally get no noise from the disk).
I have suspended Malaria processing in order to get some proper performance from my PC and to stop the disk noise. Normally I run my Boinc processes in the background and never notice them.

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3217
Credit: 5,500,753
RAC: 3,644

1. could someone please delete all those MIE lines on an earlier post it does not make this thread very user friendly.


No, but as a moderator, I can hide it (That is it is hidden for all normal users, not moderators or admins, so we have to suffer a slow loading thread)

Is it better now ?

Bill
Send message
Joined: Nov 1 09
Posts: 3
Credit: 3,625
RAC: 0

Much better - Thanks

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4382
Credit: 5,361,193
RAC: 1,084

2. Problem with v6.35 on W XP Home.
I have found that each time a checkpoint is written it is 50MB in size.
most other apps in Boinc write a few kb.
The results are every time a checkpoint is written the CPU usage drops to less than 40% for 5 seconds and even foreground apps are sluggish as 6.35 does not back off during the checkpoint process.
Also the noise generated by my hard disk during this write is intrusive (I normally get no noise from the disk).
I have suspended Malaria processing in order to get some proper performance from my PC and to stop the disk noise. Normally I run my Boinc processes in the background and never notice them.


I wonder if you updated to a newer version of Boinc it would help? You are still running a very old version of Boinc, according to the website:
"BOINC client version 5.2.13"

If you go to download the current version of Boinc it is 6.10.56. Your pc is also quite old, again according to the website, could your hard drive be having 'issues'? Does the hard drive make noises during normal hard drive intensive operations? ie copying a large file on or off of it.

I am NOT demeaning your pc, we all have old pc's we use for some reason or other, what I am asking is if the hard drive could be having troubles since it looks like it has been around a while. The website only gives so much info about our pc's and I am trying to diagnose over a distance.

hardy
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: Feb 18 09
Posts: 141
Credit: 54,376
RAC: 129

Hi Bill -- yes, checkpoints _are_ big: we typically need to represent maybe a million infections at once in a scenario. They're also well compressed, since they're written as binary.

HD noise during writing is unusual. Usually the noise you can notice from a HD is when it's seeking, not when continuously reading/writing. Maybe you need to defragment your disk?

Bill
Send message
Joined: Nov 1 09
Posts: 3
Credit: 3,625
RAC: 0

Hi,
Thanks Mikey and Hardy for your responses.
I try to keep my disk reasonably defragmented but with Windows it is a pretty soul destroying task as it fragments again so quickly. I am not aware of any problems with my hard disk but worse than the noise is the negative effect on foreground tasks during checkpointing.
I may try it with longer between checkpoints (currently 30secs) but probably I will just give up on running the application on this PC since it obviously does not have the power to cope.
However my own view is that 50MB is an unreasonably large checkpoint.

susele71
Send message
Joined: Oct 23 07
Posts: 3
Credit: 29,785
RAC: 0

I have been running units of the test version 6.35 on my Mac for the last week or so and everything went well until now. Since starting wu2_6377_1275209386_1 and wu2_6432_1275209386_1, Boinc has crashed my computer at least 5 times overnight. I finally had to abort the tasks.

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3217
Credit: 5,500,753
RAC: 3,644

Hi,
Thanks Mikey and Hardy for your responses.
I try to keep my disk reasonably defragmented but with Windows it is a pretty soul destroying task as it fragments again so quickly. I am not aware of any problems with my hard disk but worse than the noise is the negative effect on foreground tasks during checkpointing.
I may try it with longer between checkpoints (currently 30secs) but probably I will just give up on running the application on this PC since it obviously does not have the power to cope.
However my own view is that 50MB is an unreasonably large checkpoint.


You can try to increase your checkpoint time, I personally use 900secs (15mins) and have done so for years. I don't shut down all that much and its better on the hard drives with less activity. So if I do shut down (for applications that do not checkpoint on exit) or the komputa krashes only up to 15M work is lost. Its a small trade off, but probbaly in the long run, I've saved more komputa time by not checkpointing more frequently than I have lost by redoing that work from the previous checkpoint.

Remember too, every time you checkpoint, it takes some cpu cycles to process the commands and transfer data, and those are being taken away for time that could be used for the pure number krunchin'. It may be a small amount, but over the long run it adds up.

It's up to you how to set your preferences, but yours makes 120 checkpoints an hour whereas I make 4 per hour. Someone else can do the math on how much extra time is spent on all those checkpoints.

I see too you have a single core, but it dosen't show the speed. This probally accounts for the performance and you should consider a larger time betwwen checkpoints.

I run P4-HT's with GPU's so they are running a GPU task and two CPU tasks at 100% CPU use always, along with my regualr work. I never notice any computer lag on what I do even if running two MC at the same time, including whenever these apps checkpoint and this includes a dozen other projects. Only project that bogs me down is seti when it uses the GPU, all other projects have no effect on my performance.

I suggest you give it a try and see if a longer checkpoint time improves your performance, if so then decided to keep that larger time. Try several also like 300, 600, 900 secs and give it an hour or two to run each time.

Post to thread

Message boards : Number crunching : Testing openMalaria app version 6.30/6.31/6.32/6.33


Return to malariacontrol.net main page


Copyright © 2013 africa@home