Site Maintenance
by levans, Oct 2, 2019, 5:52 PM
Today we had a weird site maintenance to assign another IP to our server. Why? Because the IP we had was blocked in Russia, and therefore a subset of people in Russia were unable to access the site. We moved back to a former IP which did not have problems, and all seems to be good now. In this post I'm going to discuss some reasons for maintenance, and what we do to prepare for and perform the maintenance.
Server maintenance is always a pain and I wish we never had to do it. The reasons behind server maintenance are generally related to hardware problems, hardware upgrades, security issues, and weird one-off things like the aforementioned IP address not working in some country. It almost always involves taking the site down for at least a short amount of time, which we, or at least I, hate to do. Server maintenance is almost never performed due to new features, as those can be applied without doing any server maintenance.
There is another reason for maintenance, and that is our hosting provider requiring maintenance on their network or server infrastructure. We have absolutely no control over these, and sadly, they often take more time than the maintenance we need to do. There is a chance such a maintenance will come soon due to a flaw our hosting provider recently found on their kernels. We will keep you abreast of any planned maintenance in a global announcement in the forums for any future maintenance needs.
Preparing for maintenance is a huge hassle. We have meetings to define what the maintenance is. We have to decide when to do the maintenance. We have to put up global announcements about the maintenance. We have to inform our employees, both at AoPS Online and AoPS Academy about the maintenance. We have to set up test servers to practice the maintenance. We have to document every step we plan to take. We sometimes have to build brand new servers, which can take hours. We then have to all get together and perform the maintenance. We have to take the site down for a period of time which we never like to do. If things fail, then we go into panic mode trying to figure out how to fix things. When all is done, we then have to document the maintenance steps we took, monitor the site to make sure there are no issues, fix any issues that do pop up, inform internal staff that we are done, and then remove the global announcements. As you can imagine, all this takes a great deal of time, and even some frustration. We try to keep maintenance to a minimum, but when we have to do it, we have to do it. We apologize for any inconveniences these maintenance scenarios cause.
Server maintenance is always a pain and I wish we never had to do it. The reasons behind server maintenance are generally related to hardware problems, hardware upgrades, security issues, and weird one-off things like the aforementioned IP address not working in some country. It almost always involves taking the site down for at least a short amount of time, which we, or at least I, hate to do. Server maintenance is almost never performed due to new features, as those can be applied without doing any server maintenance.
There is another reason for maintenance, and that is our hosting provider requiring maintenance on their network or server infrastructure. We have absolutely no control over these, and sadly, they often take more time than the maintenance we need to do. There is a chance such a maintenance will come soon due to a flaw our hosting provider recently found on their kernels. We will keep you abreast of any planned maintenance in a global announcement in the forums for any future maintenance needs.
Preparing for maintenance is a huge hassle. We have meetings to define what the maintenance is. We have to decide when to do the maintenance. We have to put up global announcements about the maintenance. We have to inform our employees, both at AoPS Online and AoPS Academy about the maintenance. We have to set up test servers to practice the maintenance. We have to document every step we plan to take. We sometimes have to build brand new servers, which can take hours. We then have to all get together and perform the maintenance. We have to take the site down for a period of time which we never like to do. If things fail, then we go into panic mode trying to figure out how to fix things. When all is done, we then have to document the maintenance steps we took, monitor the site to make sure there are no issues, fix any issues that do pop up, inform internal staff that we are done, and then remove the global announcements. As you can imagine, all this takes a great deal of time, and even some frustration. We try to keep maintenance to a minimum, but when we have to do it, we have to do it. We apologize for any inconveniences these maintenance scenarios cause.
This post has been edited 5 times. Last edited by levans, Oct 3, 2019, 1:59 AM