Reached daily quota of (ridiculously low value) results

Message boards : Windows : Reached daily quota of (ridiculously low value) results

Author Message
rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

Hey there. Every couple of weeks, my MalariaControl project deteoriates: the usual 1000 results / day is suddenly reduced to 30 or 44 or something equally useless value. It's very frustrating to see on Monday mornings that my work computer (which I can't turn off for various reasons!) has only been twiddling its thumbs over the weekend.

I'm using winxp on a quad core processor.

No other useful entry in the log, just "no work sent", "no work available", and then "reached daily quota".

In the past I could fix this by moving the entire BOINC & data folders to a new computer, let it connect, then bring it back. Now, no amount of moving, resetting, reattaching etc. has fixed this for over an hour.

What's wrong, and how do I fix it? Better yet, could the MC engineers please fix this permanently? Or does MC not want my results any more? Just say the word...

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4382
Credit: 5,361,193
RAC: 1,084

Hey there. Every couple of weeks, my MalariaControl project deteoriates: the usual 1000 results / day is suddenly reduced to 30 or 44 or something equally useless value. It's very frustrating to see on Monday mornings that my work computer (which I can't turn off for various reasons!) has only been twiddling its thumbs over the weekend.

I'm using winxp on a quad core processor.

No other useful entry in the log, just "no work sent", "no work available", and then "reached daily quota".

In the past I could fix this by moving the entire BOINC & data folders to a new computer, let it connect, then bring it back. Now, no amount of moving, resetting, reattaching etc. has fixed this for over an hour.

What's wrong, and how do I fix it? Better yet, could the MC engineers please fix this permanently? Or does MC not want my results any more? Just say the word...


If you look at your pc and then the tasks you will see that alot of recent task have had some errors, which is why the number of max units allowed had reduced. The error message is "Can't get shared memory segment name: shmget() failed
</message>]]>". Can you reboot the pc? If so try that, it has worked for me in the past. As you get more units, tomorrow, and return properly crunched untis your max daily number will go back up to normal. It is a fast downward spiral but a not quite as fast upward spiral too. This prevents machines from getting thousands of units and trashing them all and then the Project just having to manage them all resending, to someone else, every unit they send to you.

Over at Seti the answer to the same problem with the same version of the software as you was "The upgrade to 6.2.18 was the right move. This is one of the issues that has hopefully been addressed." So I guess the answer is to upgrade to the newest version of the software, 6.2.19, and see if it fixes it. Good luck and let us know how it goes.
____________

rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

Thanks for the quick reply. This computer performs perfectly in every other task. The problem must be with the MC client. (This would not be the first time.)

I have since performed a reboot, then a complete BOINC reinstall on another computer (where I do have admin rights) which I copied back to the quad core computer. These steps didn't fix it.

I simply can't give this extra amount of attention to solving problems in the MalariaControl client. Someone tell me which file or registry entry records machine ID -- based on which my computer is falsely remembered as "unreliable", or provide any other simple way to fix this permanently.

I'm running the very latest BOINC.

rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

OK I've now reinstalled BOINC on a computer with a different IP address which has never run BOINC before. Created a nice portable self-contained installation (data dir within the BOINC folder; no screensaver or service etc.). I then moved the whole thing to the quad core computer.

After moving, the "Attach to Project" window pops up since it still doesn't recognize the tasks that were successfully downloaded on the first computer. When I do attach, the quota again applies. This was different a couple of weeks ago -- I successfully moved my BOINC folder back then.

I can't believe I spent the whole morning trying to work around this MC client bug.

I'll keep running another project until someone fixes this.

PS. Yes I can tell a computer with a hardware problem from a perfect one; it's what I do for a living. I can't detect any problem with this computer.

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3217
Credit: 5,500,753
RAC: 3,644

OK I've now reinstalled BOINC on a computer with a different IP address which has never run BOINC before. Created a nice portable self-contained installation (data dir within the BOINC folder; no screensaver or service etc.). I then moved the whole thing to the quad core computer.

After moving, the "Attach to Project" window pops up since it still doesn't recognize the tasks that were successfully downloaded on the first computer. When I do attach, the quota again applies. This was different a couple of weeks ago -- I successfully moved my BOINC folder back then.

I can't believe I spent the whole morning trying to work around this MC client bug.

I'll keep running another project until someone fixes this.

PS. Yes I can tell a computer with a hardware problem from a perfect one; it's what I do for a living. I can't detect any problem with this computer.

The only problem with installing on one computer and then copying to another is you don't get the registry entries which are set during install.

You also do not get the user groups boinc creates for secirity. This may cause problems with applciations trying to write to a directory which they have not been given rights to write to, since you didn't do the proper install.

Also if the cpu and hardware that boinc detects are not identical, when you start boinc it will detect this as a different computer, regenerate the key, which will be the same as previously attched with on that computer, not the one you copied from the other computer. Hence the problem with the quota.

Once you attach and return good results, your quota will be raised. This is to prevent computers with errors from using up all the results quickly, so users without problems can get them instead of seeing no work avaialable because computers with erors took all the tasks.

You would be best to install boinc directly on the computer you intend to run it on.

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4382
Credit: 5,361,193
RAC: 1,084

Thanks for the quick reply. This computer performs perfectly in every other task. The problem must be with the MC client. (This would not be the first time.)

I have since performed a reboot, then a complete BOINC reinstall on another computer (where I do have admin rights) which I copied back to the quad core computer. These steps didn't fix it.

I simply can't give this extra amount of attention to solving problems in the MalariaControl client. Someone tell me which file or registry entry records machine ID -- based on which my computer is falsely remembered as "unreliable", or provide any other simple way to fix this permanently.

I'm running the very latest BOINC.


What is your setting for paused tasks? Your Account, Computing Preferences, Leave applications in memory while suspended? If it is set to No change it to Yes.
____________

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4382
Credit: 5,361,193
RAC: 1,084

Thanks for the quick reply. This computer performs perfectly in every other task. The problem must be with the MC client. (This would not be the first time.)

I have since performed a reboot, then a complete BOINC reinstall on another computer (where I do have admin rights) which I copied back to the quad core computer. These steps didn't fix it.


This has been discouraged for many versions now. Too many possibilities for people to cheat that way. I am NOT saying you are or would or anything else, this was an Admin decision by the writers of Boinc and affected many people that do not have internet access on every computer. Your idea of installing on one computer and then copying to another is just one of the ways the decision has affected users.
Basically I would say that if your computer crunches another project just fine, let it and stop crunching for Malaria on that particular pc.
____________

rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

(Replying to multiple posts here.)

I know that copying one installation to another computer has its own set of issues. I'm not saying it's ideal. But due to circumstances beyond my control, I can't be administrator on this quad core computer, hence I couldn't install BOINC with conventional methods.

I understand there's a possibility that this could be abused. But don't you think it's imaginable that some MC tasks are simply flawed and my stats are put at a disadvantage for no good reason? I'm not the kind of guy to sit on an idle quad-core computer that *must* run 24/7 anyway. That's why I'm ready to invest a certain amount of time to circumvent MC errors. This time, it was a *LOT* more than usual and I don't appreciate it. I feel I'm already contributing enough attention to this project for free.

Let's see the error message then: "Can't get shared memory segment name: shmget() failed </message>]]>". This is what caused multiple tasks to fail. I haven't programmed BOINC tasks, but on a computer with 4GB physical memory, and another 4GB of swap file, most of which is unused (especially during the weekend when I'm not there) don't you think that could be an MC or a BOINC programming error?

I may be able to reboot this computer more often in the future, in case the above message is due to BOINC leaking memory over long periods. (This would be yet another way to circumvent lousy programming. But I'm starting to think this is simpler than spending an extra half day trying to reinstall it every few weeks.)

My task is set as "leave in memory when paused".

rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

Out of curiosity, I've checked out some of the failed Work Units.

Here are the top 3 on that certain computer:
https://malariacontrol.net/workunit.php?wuid=15185784
https://malariacontrol.net/workunit.php?wuid=15186764
https://malariacontrol.net/workunit.php?wuid=15186763

They were distributed to four computers each. *50%* came back with the same "client error"! Client errors must be rare otherwise: I quickly checked a page of successful WUs and there was no client error.

What are the chances that the problem is on my end, *AND* on three other computers'? Can I get an honest MC developer's opinion here? They must know from their statistics which batch of WUs are screwed and if this was one of them.

Oh yes. The successful WUs are usually only distributed to 2 computers each. In this recent batch, there's *FOUR* computers in the list -- two always "client error", the other two successful. Could it be that the MC engineers found out what they missed, and re-submitted the corrected batch for reprocessing?...

My heavy duty computer, which processes around 800 tasks each day, is more likely to download an entire poor batch. IMO the MC engineers should develop better quality batches, or *at the minimum*, remove those wrongly assigned quotas for computers that received such poor quality batches.

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3217
Credit: 5,500,753
RAC: 3,644

Out of curiosity, I've checked out some of the failed Work Units.

Here are the top 3 on that certain computer:
https://malariacontrol.net/workunit.php?wuid=15185784
https://malariacontrol.net/workunit.php?wuid=15186764
https://malariacontrol.net/workunit.php?wuid=15186763

They were distributed to four computers each. *50%* came back with the same "client error"! Client errors must be rare otherwise: I quickly checked a page of successful WUs and there was no client error.

What are the chances that the problem is on my end, *AND* on three other computers'? Can I get an honest MC developer's opinion here? They must know from their statistics which batch of WUs are screwed and if this was one of them.

Oh yes. The successful WUs are usually only distributed to 2 computers each. In this recent batch, there's *FOUR* computers in the list -- two always "client error", the other two successful. Could it be that the MC engineers found out what they missed, and re-submitted the corrected batch for reprocessing?...

It doesn't work that way. once work is on the server, it would need to be canceled and re-issued as new work units. The reissued tasks that was successful is the same as sent to the computers that failed.

What you see is someone else with a older client.

Possibly the error affecting some computers is client related ?

Have you tried to upgrade to the latest recommended version, or even the latest test version to see if the problem goes away ?


My heavy duty computer, which processes around 800 tasks each day, is more likely to download an entire poor batch. IMO the MC engineers should develop better quality batches, or *at the minimum*, remove those wrongly assigned quotas for computers that received such poor quality batches.

The quota is automatic by the BOINC server software.

Once you start returning good work, your quota will automatically go up.

Bad results make your quota go down, Good results make your quota go up (up to the max set by the project).

rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

What you see is someone else with a older client.

Possibly the error affecting some computers is client related ?

Have you tried to upgrade to the latest recommended version, or even the latest test version to see if the problem goes away ?


I already mentioned I have the very latest BOINC installed. (v6.2.19.)

If a WU is indeed not compatible with an older client, it should say so. It should NOT fail with a "Can't get shared memory segment name" message and put potentially hundreds of unsuspecting users on a ridiculously low quota.

Remember that MC contributors come here to devote free CPU cycles. If it turns out that through NO fault of their own, several days worth of their contribution was lost, and several *MORE* days will be lost due to the low punishment quota, it questions the very reason of their contribution. That's disrespectful. Imagine that one day due to your manager's fault, a week's worth of your works is lost, *AND* he punishes you by making your work more difficult.

Until you can prove to me that the MC team tried to bring to my attention that BOINC version 6.2.14 (which is what I used to have installed) doesn't work any more, it remains the MC developers' fault.


The quota is automatic by the BOINC server software.

Once you start returning good work, your quota will automatically go up.

Bad results make your quota go down, Good results make your quota go up (up to the max set by the project).


Results are good when the hardware I provide is working correctly, *AND* the work units are programmed well. This time, the work units were screwed up. The least MC can do is try to fix this. And not put me, along with potentially many others, on a low quota.

And remember, I said nothing the first two times it happened, I just silently fixed it. This time, that didn't work. I thought someone may tell me how I could simply fix this on my end...

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3217
Credit: 5,500,753
RAC: 3,644

From what I see others with same 6.2.19 client, return successful results, example.

Since I'm running a higher alpha test version, I have no way to see if that particular version causes me problems.

I'm sorry my suggestions did not help. That's all I can do.

rubinhood
Send message
Joined: Mar 16 07
Posts: 10
Credit: 1,382,989
RAC: 47

Just to follow up -- if anyone debugs this issue, this may be useful.

Here's a WU that failed on my computer but was successful on two others:
https://malariacontrol.net/workunit.php?wuid=15187188

The two other computers successfully finished the task even though they were one MAJOR version behind the latest (v5.10.45)! I think this puts the previous ideas that I should have upgraded my client (v6.2.14) in perspective.

Post to thread

Message boards : Windows : Reached daily quota of (ridiculously low value) results


Return to malariacontrol.net main page


Copyright © 2013 africa@home