Posts by Neil
1) Message boards : Number crunching : openMalaria test version v6.65 (Message 20177)
Posted 188 days ago by Neil
I was just sitting there watching Star Trek On Demand, when I suddenly noticed my uProcessors were running full bore. All of my uProcessors.
Thanks to MichaelT and everyone.
My next question was going to be how to get rid of all those work units that were stuck inside my computers, but they've all cleared out -- Ready to Reports, Computing Errors, and Aborteds.
Somehow, I feel better when my Malariacontrol.net is running smoothly. Guess I should mention that to my shrink.
Best luck / Don't crunch too hard.
2) Message boards : Number crunching : openMalaria test version v6.65 (Message 20138)
Posted 190 days ago by Neil
> Do your computers really fill?
Could be I misspoke. Now, I don't know what I meant by "filled."
Computer 1: It's got one work unit that I tried to abort because it errored. Clicking "Update," but it won't go away. No more work units coming in.
Computers 2 thru N: Plenty of WUs Ready to Report, and a lesser bunch of Computation Errors that ran for 2 seconds. No more work units being crunched, no more coming in, and none going out, cpu chips are so cold.
All computers are ready to run -- Boinc is not suspended or anything, but nothing's happening. All computers repeat the error messages (Can't parse / No close tag) a dozen times. "Communication Deferred" for hours, after which I guess they will try again; best luck.
Thanks for following my ramblings. I hope I cleaned up my ambiguities.
3) Message boards : Number crunching : openMalaria test version v6.65 (Message 20134)
Posted 191 days ago by Neil
Crunching is grinding to a halt.
9/15/2012 10:23:11 AM | malariacontrol.net | Restarting task wu_3210_318_23272_0_1347324065_1 using openMalariaBeta version 665 in slot 0
9/15/2012 10:23:11 AM | malariacontrol.net | Sending scheduler request: To fetch work.
9/15/2012 10:23:11 AM | malariacontrol.net | Reporting 12 completed tasks, requesting new tasks for CPU
9/15/2012 10:23:22 AM | malariacontrol.net | [error] Can't parse workunit in scheduler reply: unexpected XML tag or syntax
9/15/2012 10:23:22 AM | malariacontrol.net | [error] No close tag in scheduler reply
9/15/2012 10:24:48 AM | malariacontrol.net | Fetching scheduler list
9/15/2012 10:24:50 AM | malariacontrol.net | Master file download succeeded
9/15/2012 10:24:55 AM | malariacontrol.net | Sending scheduler request: To fetch work.
One option would be to sit and wait until things straighten out, but I don't know if malariacontrol will ever be able to upload completed work units from my computer that have already failed to upload.
The completed work units are stuck in my computer -- Even if I try to abort them, they won't go away.
To get rid of stuck work units, my first effort was to uninstall and re-install Boinc, but the damaged work units re-appeared. So, I uninstalled Boinc again, and deleted the Boinc directories manually. When I re-installed, I had lost all my project data (i.e. Recent Average Credits were set to Zero).
My other 3 computers are filling fast with stuck work units, and I don't know what to do with them. Sit on my hands? Clearing out the stuck work units is obviously NOT an option. What can I do?
I wish new formattings of work units would be checked for viability before they're distributed.
4) Message boards : Number crunching : Long Run Times (Message 19123)
Posted 292 days ago by Neil
Normal is good!
The wheels of business grind slowly.
My Recent Average Credit is climbing about 100 per day, from last week's low of 1800 back to 4000 where it belongs.
"Pendings" down to 13 Tasks, from a high of about 40. I think I would have to keep better records offline to keep track of which Tasks were converted to either Valid or Errors, or if they just faded away without explanation [regarding what happened to the thousands of lost hours].
Moot question. It's a transient situation, returning to normal.
5) Message boards : Number crunching : Conflicting "Pending" links (Message 19117)
Posted 293 days ago by Neil
I thought "Pending" meant my results were being scrutinized. It's interesting that it's really about waiting for corroborating work from other clients. It changes the nature of my concerns. For instance, I can stop being paranoid about being on the Pending list.
Probably more complicated for Malariacontrol to get corroborating work from other clients during instances such as the May 28th Debacle.
I think I first saw 40 Pendings a few days ago. They were down to 20 yesterday; 15 today.
> On my home page (https://malariacontrol.net/home.php?userid=57156), there's a link for "Pending credit." Link leads to a page (https://malariacontrol.net/pending.php) that says, "Pending credit: 0.00"
>> That page is effectively redundant Neil.
Perhaps you mean the link is no longer valid? Seems like a simply-rectified source of confusion. I would suggest that the invalid link should just be removed -- the "Pendings" are otherwise properly available from the Tasks webpage.
Personally, I'm concerned about it because I'm easily confused. Thinking of downloading my consciousness into my computer. Then I can donate my body to the gardener. Brain is already half compost. Eyes and ears go into the recycling bin; can't recommend turning 50; just don't do it. Watch out for tangents -- what was I talking about?
> ... some users choose to keep their computers hidden and making the "results.php?userid=" pages publicly accessible would negate that choice.
I noticed there's an option to keep one's computers secret. Having trouble imagining why that might be necessary, but I appreciate the opportunity to opt-in before there's sharing. If I were still trying to contribute with a 1400 MHz Celeron, I might want to keep it a secret. But I think my 1.8 GHz Celeron (with modern instruction sets and modest 35 watt usage) is a beautiful thing, and I want everyone to see.
I think I have an E5400 microprocessor sitting in its retail box, waiting for me to install. Then I can chuck the beautiful Celeron into the trash. I'll get around to it.
My Boinc standings were improving more and more slowly, and I was keeping an eye on it. If I started losing standing, I would have been more motivated to change that processor. And then, the May 28th Debacle happened, and my standings became a convoluted mess. But, that wasn't the scenario that would have convinced me to replace the Celeron.
Saved by the bell.
6) Message boards : Number crunching : Conflicting "Pending" links (Message 19106)
Posted 294 days ago by Neil
Contradictions happening with my "Pending credit" list, depending on which link you use to get there.
On my home page (https://malariacontrol.net/home.php?userid=57156), there's a link for "Pending credit." Link leads to a page (https://malariacontrol.net/pending.php) that says, "Pending credit: 0.00"
But, from my homepage, if you click on the "Tasks" link, and then click on its "Pending" link (https://malariacontrol.net/results.php?userid=57156&offset=0&show_names=0&state=2), I get a webpage with a list of 20 pending tasks -- "Completed, waiting for validation," worth about 70,000 seconds.
I'm just saying, because one Pending link says you have no pending credit (which might be misleading or incorrect), and the other Pending link has the real, long list of pending credit (which you might not notice if you think you already got the desired information from the 1st link...)
All those Pending tasks are related to the May 28th Debacle. Generally, I think hardly any tasks become "Pending," usually, so maybe it's not a big issue. But I think both links, both allegedly pointing to the same information, should correlate with each other.
On a tangent, my list of Pending tasks has shrunk from 23 tasks to 20 tasks over the last 12 hours; that's good. I don't know if they'll be validated or not; that's inconclusive.
7) Message boards : Number crunching : Long Run Times (Message 19105)
Posted 294 days ago by Neil
Work unit: 69482324
Computer: 200864 (Core2 CPU 6600 @ 2.40GHz)
Sent: 28 May 2012 14:54:25 UTC
Time reported: 29 May 2012 9:25:44 UTC
Status: Completed, can't validate
Run time (sec): 54,942.67
CPU time (sec): 54,873.67
Application: openMalaria: A simulator of malaria epidemology and control (Branch A) v6.58
Well, one core wasted 15 hours. There have probably been bigger wastes since May 28, and I hope they'll be the last.
The last task sent to me that wound up on my Pending list ("waiting for validation") was sent June 1.
Starting June 2, I've been sent 140 additional tasks, and every one of them "Completed and validated" except for 28 that are still running. No Pendings, Invalids, or Errors. That's better.
The tasks all seem to be running from 1 to 3 hours. Most start off with my client saying they'll take 17 hours, and then they whittle down to a few hours by the time they end.
8) Message boards : Number crunching : Long Run Times (Message 19080)
Posted 296 days ago by Neil
So the units for the Credit column is in Cobblestones, not Seconds? (I recommend data units should be labeled.) OK, that's half an answer to my question.
I could use a little more confirmation regarding having achieved 16 Cobblestones of Credit for 20000 seconds of CPU Time, and all the rest of my successfully completed work units are of similar ratio. Does that sound approximately like what you'd expect from a Core-2? If so, I'll assume we're back to normal and I'll turn on the rest of my processing threads.
Or is the credit off by a few magnitudes from what's expected?
9) Message boards : Number crunching : Long Run Times (Message 19073)
Posted 296 days ago by Neil
Oh, and my list of "Pending Tasks," which is usually empty, if filled to the rim.
10) Message boards : Number crunching : Long Run Times (Message 19072)
Posted 296 days ago by Neil
Dear Malaria bugs,
On the Malariacontrol website, I checked my list of "Tasks" which goes back about three days.
All my work units have been between 1000 and 20000 seconds (between 16 minutes and 6 hours).
For the 20000 second work unit, I got Credit = "64."
I assume that's 64 seconds of credit?
For the shorter work units, I got less credit, I assume proportionally, also in the tens-of-seconds.
If I'm only getting 1/3-of-a-percent of the credit that I'm crunching, then I'm going to keep running only 1 out of 7 processors until things are fixed.
If someone thinks I'm misinterpreting the Credits column, please let me know, thanks.
Even jdvb ("Have i7 Will Travel") mentions work unit https://malariacontrol.net/workunit.php?wuid=69400915
It shows thousands of seconds of work, but only tens of seconds credited.
Unless I'm mistaken about getting a minute's worth of credit for 20,000 seconds of work, then I think we have another big problem, in addition to getting work units that expire too quickly.
Don't crunch too hard,