Few events bring a company together like a server outage. When the Exchange Server would go down at Microsoft, our group would race towards the pool table in building 25. We would play a few games before checking to see if the email was still down. Most of the time, IT was able to bring the server back online, and we would return to work. When that did not happen, we would head home for the day. The email was and still is the lifeblood of most organisations, and outages are no laughing matter.
Server Maintenance Matters
The cloud has helped reduce these outages, but it certainly doesn’t completely stop them from happening. Moreover, even if you have moved portions of your infrastructure to the cloud, there’s a good chance you are still running some servers on premise. This week I would like to discuss how you can best keep these servers running smoothly. The last couple of years, I have helped two companies consolidate their servers. We removed some old hardware and replaced it with new Xeon-based servers running Windows Server. I learned a lot in that process, including about proper server maintenance – and what happens when it is not implemented. I hope you can learn from my experience.
Keep your OS Updated
This seems obvious. A no-brainer. So yet all it takes is a nefarious piece of malware like the WannaCry worm to get everyone’s attention. WannaCry did most of its damage to unpatched Windows 7 computers, but it also attacked some servers running Windows Server 2003. There are a couple of issues at play here. First, you want to make sure you are running a version of Windows that Microsoft supports with regular patch releases. You then need to keep that OS up-to-date. I spoke with many people who had no idea that Microsoft had dropped support for consumers running Windows 7.
I have talked to too many people in IT who, once they get their servers running properly, don’t want to touch them. A few take this approach to the extreme by turning off the Windows update service which is a recipe for disaster. Testing patches in a VM take time. Microsoft has released buggy patches in the past. However, that does not mean you should not work towards keeping all your systems up-to-date. You might not run into issues for years. However, running unpatched servers will eventually catch up to you.
The newest versions of Windows give users more control over how and when updates are applied. If you are interested in how Microsoft rolls out updates for Windows Server 2016, check out this article from Redmond Magazine.
Physically Clean Your Server
However, I keep my server in a closed cabinet! That is a really good start. If you are lucky enough to work for a company that provides server racks, cabinets and a proper environment for all company’s servers, then kudos to your CEO. Even if your company provides all that, your servers can inhale dirt and dust that can degrade performance and reliability. Today’s hot running CPUs and GPUs will downclock themselves if they do not have adequate cooling.
Good quality servers have powerful fans to keep air moving over and around critical components. However, all that power means the fans can suck dirt and dust into the case. A few years ago, I went to a dentist’s office to help him upgrade his server. He told me he never removed it from a fancy glass enclosure he kept at the back of his office. The server ran his patient management software and was occasionally rebooting during the day. I asked him when the last time was that he cleaned the case filters. I had my answer when he simply stared back at me. When I removed the server from the enclosure, I found his case filters full of gunk to the point the server was throttling itself due to the heat inside the case.
I have used compressed air to clean both desktop and rackmount server cases. Be careful when shooting compressed air through the fans that don’t damage them. Make sure you remove and clean all the filters if your server case has them. Some of the newer cases have filters that pull out from the bottom in addition to top and rear mounted ones.
Virtualization Helps Server Maintenance
Do you remember the days of the backup server? I spoke with a day trader recently who is still a fan of the backup server, despite the extra costs and administration. Thankfully, we live at a time when you can virtualise nearly any server. In fact, you would be wise to virtualise every server you can. Why? Because it is so easy to spin up a backup VM today. Consolidating multiple servers running on older hardware by virtualising them on newer hardware will nearly always result in improved uptime.
I understand that not all servers can be virtualised. Sometimes licensing, performance and hardware issues prevent it. That still leaves many opportunities to virtualise the servers that make sense. For a list of servers, you should not virtualise see this article from Contel Bradford on the Recovery Zone.
Check Logs for Hardware Errors
Bad components can bring a server to its knees if left unchecked. Hardware errors often show up after POST and after Windows has started all its services. Check the system logs for hardware issues, as a part of your server maintenance strategy. You may find that updating a driver for GPU or RAID card fixes the issue. If the error persists, you’ll need to replace the component.
RAID Controllers, like this model from LSI, run very hot
It is not a bad idea to remove any PCI-E cards or drives you are not using. Server hardware is built to run 24/7. I do not see any issues with CPUs, boards, and RAM. Even today’s GPUs tend to run for years without issues. However, I see a fair share of power supplies, fans and expansion cards fail over time. RAID cards are notorious for running hot, which shortens their lifespan. It never hurts to keep an eye on system errors as well. However, I have found that hardware errors, when left unchecked, are far more likely to take a server down.
Verify Your Backups
So you have scheduled server backups. Each week you confirm the backup server is running correctly. However, are you taking the extra time to verify your backups work? Verifying the integrity of your backups is often the most overlooked step of a server backup process. How do you do this? Well, you will want to run some test recoveries until you feel comfortable with your process. Going forward, spot checks may be adequate.
If you are outsourcing your backups to a cloud provider, you will want to understand how they go about verifying backups. Elements such as the backup location, schedule and recovery times are all critical to maintaining a solid backup plan. You should have a firm understanding of all of this element whether it is your team or a 3rd party providing the service.
I have mentioned this before, but you want to use tested and trusted solutions when your reputation is on the line. Companies such as StorageCraft offer a line of backup solutions that work with all types of servers, including products for Exchange and virtualized environments.
Many factors contribute to keeping your services running smoothly and with as little drama as possible. Some of the most simple tips are the ones people most often overlook. One would think that keeping your server off the floor would be visible. I still visit companies where one or more servers are running off the floor. Finding the proper home for your server should be task #1.
Do you have any tips for server maintenance?