Update 2
Update 1
First, I want to apologize for the issues the site was having last night. I know how frustrating it is when a service you rely on is slow or completely unuseable, and I'm sorry that's something you had to deal with.
I am looking for what is causing the problem, and more importantly solution(s). However, there doesn't seem to be a clear answer to the former question, which makes the latter a shot in the dark.
The issues, to me, don't make sense given my understanding of performance. There's clearly an bottleneck somewhere, but I can't figure out where.
Last night, the peak 1 minute load was 6.28 which was at one specific point, beyond that load was between 5.5 and 4.0. The server has 6 cores, and to my understanding 1.0 load represents 100% CPU usage for a single core, thus 6.0 is 100% usage for 6 cores.
Thus, outside of one minute, CPU load was below 100% across all cores.
Memory usage never went above 2GB (out of 3), and thus was well within reason as well.
Inspecting the queue, which shouldn't be causing timeouts anymore regardless, didn't show any backed up requests.
The only oddity I found was that sendmail was apparently stuck in a recursion loop, but I wouldn't think the amount of I/O it was causing would have been significant enough to introduce slowdowns. However, I don't have any log information regarding I/O, which I have since addressed.
Of course, none of the above is a solution, so at present I'm working to make a few key changes to reduce load on the server. I hope to have these changes implemented by tonight. Though, again, it really is just an educated guess at this point if this will have any impact.
No comments:
Post a Comment