Important information

Message boards : News : Important information
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 166 - Posted: 9 Jul 2024, 13:54:21 UTC

In short, in the server I have a motherboard with two CPU slots, unfortunately one of them is damaged and the equipment works at half power. In the next 2 weeks, I would like to take advantage of the test period of the project and take the motherboard to the service center, where I hope they will repair the slot for the second processor.
What will this mean?
Additional 8 cores and 16 CPU threads and the ability to add up to 128 GB of RAM <- thanks to this, the project server will receive additional resources for the future.
If the project happens to be unavailable, don't be afraid, it's just a service break <- and when not to do it, if not during testing :)
Greetings to everyone and I ask for your understanding
ID: 166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rilian
Avatar

Send message
Joined: 21 Jun 24
Posts: 9
Credit: 10,016
RAC: 0
Message 167 - Posted: 9 Jul 2024, 14:08:29 UTC - in response to Message 166.  

good luck with the repair!
I crunch for Ukraine
ID: 167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 1 Jul 24
Posts: 8
Credit: 28,630
RAC: 79
Message 168 - Posted: 9 Jul 2024, 18:59:33 UTC - in response to Message 166.  

Any idea how long the repair or actually the entire service braek will take? In particular: longer than the deadline of our tasks? In that case perhaps it might be a good idea to stop sending out new ones before the shutdown.
ID: 168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 169 - Posted: 10 Jul 2024, 6:13:02 UTC - in response to Message 168.  

Any idea how long the repair or actually the entire service braek will take? In particular: longer than the deadline of our tasks? In that case perhaps it might be a good idea to stop sending out new ones before the shutdown.

If servise can do repair it will one day work <- so together with my action i planing maximum 2 days break.
ID: 169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 175 - Posted: 11 Jul 2024, 10:48:32 UTC

ok, after sending some photos of cpu slot to the service they told that they need 2 work days.
I am going to turn off server 15th July evening and put it to service on 16th July.
I have hope I will back with repaired server on 19th July evening <- in worst case scenario second cpu slot won't be repaired and server will back with acctual parameters.

keep fingers crossed :)
ID: 175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 176 - Posted: 15 Jul 2024, 6:00:32 UTC

Don't forget that today is the day when serwer goes down for 2-3 days for service stuff.
ID: 176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fardringle

Send message
Joined: 22 Jun 24
Posts: 4
Credit: 15,806,016
RAC: 45,150
Message 177 - Posted: 16 Jul 2024, 5:13:43 UTC - in response to Message 176.  

Don't forget that today is the day when serwer goes down for 2-3 days for service stuff.


Thank you for the information and updates! Hopefully they will be able to actually fix the motherboard. CPU sockets can be pretty complicated, depending on the exact problem.
ID: 177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 178 - Posted: 21 Jul 2024, 1:51:00 UTC

In short, things that could have gone wrong ended up going wrong.
Yes, the server now has 2 processors, but the second one (which caused the server to be serviced) still has 3 pins (if I checked correctly) responsible for RAM support.
To sum up, the server now has 2 CPUs 8 cores / 16 threads, i.e. a total of 16 cores and 32 threads. Unfortunately, when it comes to RAM, instead of 256GB it has 192GB <- I don't know if I will consider replacing the 14 16GB sticks with 32GB, but that's for the future.
As far as I could, I tested the server under CPU and RAM load, the rest will unfortunately come out during use.

So far, the project has started, I have generated 2 series of tasks, and now... I'm going to sleep off two sleepless nights.
ID: 178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 21 Jun 24
Posts: 12
Credit: 230,698
RAC: 938
Message 179 - Posted: 21 Jul 2024, 5:01:00 UTC - in response to Message 178.  

No hurry. Get some rest and figure out the rest of the configurations later on.
ID: 179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 180 - Posted: 22 Jul 2024, 5:39:25 UTC

To put it nicely, I am very angry at the service and its actions. The server seems to be unstable and may simply hang during normal operation <- running CPS ect.
And all this despite quite rigorous and many hours of CPU, RAM, disk io, etc. stress tests when I wasn't hanging out with server
Working on it.
ID: 180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 183 - Posted: 23 Jul 2024, 13:11:17 UTC

i have no idea if is good or bad, but...
i found some problems with one of my LSI disk controller in server on last Sunday and Monday <- i will change this durring weekend, luckily i have two spare LSI

at this moment i don't see any new problem from 2 days <- i am still monitoring it
ID: 183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 190 - Posted: 29 Jul 2024, 23:03:11 UTC

Acctual server is propably dead :( is freezening all the time from last 15 hours.
I buy new stuff but i need wait for it about 3-4 days. I will do my best to keep it live as long as i can to this time.
I hope new one will have less humours :(
ID: 190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 21 Jun 24
Posts: 25
Credit: 321,844
RAC: 955
Message 191 - Posted: 30 Jul 2024, 1:50:04 UTC - in response to Message 190.  

Acctual server is propably dead :( is freezening all the time from last 15 hours.
I buy new stuff but i need wait for it about 3-4 days. I will do my best to keep it live as long as i can to this time.
I hope new one will have less humours :(


All the best with that, Good work with the project so far, well done.

Conan
ID: 191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 192 - Posted: 30 Jul 2024, 5:22:27 UTC

I am spending all night to observe server and everythink is strange :)
when my second CPU goes to 95 degree system halt <- but this is almost ok.
when i take out this cpu, my first cpu goes from standard 60 degree to 95degree <- and it's not server load.
server load it's 6 points on top or htop

when i give cpu from slot 1 to slot 2 and put secod cpu to slot 1 <- cpu on slot 2 goes 95 degree.
you can tell that is cooling failture.. almost ok but...
Know i have 4 fans with air flow 110m3 per hours each give cooling for whole board system... and yesterday evening i put 2 fans directly on both CPU with about 40m3 per hour, so i have 6 fans with about 500m3 air per hour on close case
only cpu give me so high temperature.
Disk (i have 30 disks = 24 hdd and 6 ssd) is low have 32-35 degree, every radiotors and parts on board have about 40-45 degree, 3x LSI controlers have 35-40 degree, SPU have 45 degree... only with CPU i have that strange situation.
What gives me big shock, when server freeze today morning about 5:30 CEST CPU1 has 55degree, CPU 2 95 degree and radiotor on CPU2 has 105 degree.. wow
ID: 192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WezH

Send message
Joined: 27 Jun 24
Posts: 2
Credit: 797,586
RAC: 3,101
Message 194 - Posted: 31 Jul 2024, 9:52:29 UTC

Sounds like motherboards voltage regulator module (VRM) is broken. It's giving too much voltage for CPU's.

Is there any way that you can monitor CPU voltages when operating system is running?
ID: 194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 22 Jun 24
Posts: 10
Credit: 580,718
RAC: 1,484
Message 195 - Posted: 31 Jul 2024, 10:19:09 UTC

I am always still awake at around 3am my time and I noticed that on this laptop this place came back to life but I had to go around to all my other pc's to tell them to send and receive more work from here so I just finished that.

(just a few minutes before 10:00 AM UTC)
ID: 195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 198 - Posted: 1 Aug 2024, 7:49:40 UTC - in response to Message 194.  

I see no possibilty on this old server <- only CPU core temperatures
ID: 198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 199 - Posted: 1 Aug 2024, 7:50:35 UTC - in response to Message 194.  

Sounds like motherboards voltage regulator module (VRM) is broken. It's giving too much voltage for CPU's.

Is there any way that you can monitor CPU voltages when operating system is running?



I see no possibilty on this old server <- only CPU core temperatures
ID: 199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 200 - Posted: 1 Aug 2024, 7:51:24 UTC

I just got package with new server .
I will try to put it on the run in next 2-3 days
ID: 200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile goofyx[BOINC@Poland]
Project administrator
Project developer
Project tester

Send message
Joined: 12 Jun 24
Posts: 93
Credit: 154,062,246
RAC: 235,418
Message 202 - Posted: 4 Aug 2024, 16:27:36 UTC - in response to Message 200.  

Bad luck continous :(
In new serwer two slots for RAM is broken <- i have to back it to seller and get new server.
ID: 202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : Important information

©2024 Goofyx Prodakszyn