Sandman in the cloud: Temporarily shut down non-critical systems

Moving our infrastructure into the cloud

We thought we'd like to talk about another project from our last hackathon in January 2019: the Sandman. But first a little background. We've spent the last three years slowly but surely moving our entire hosting infrastructure from self-powered racks in various data centers to multiple IaaS clouds:

As you can see, it was a long process, but it is finally coming to an end. As engineers, we are a bit sad, because the operation of our own hardware brings much freedom and fun. But the new world of Cloud also offers some unique advantages. But what are they?

If you look at the usual advertising copy, there is a lot of talk about scaling on demand to meet requirements and peak loads. This is not very common in our country. As you can see from the chart above, we have very constant growth, but no explosions. Nevertheless, we wanted to make better use of one advantage of the cloud: hourly billing.

Temporarily switch off rarely used systems

Previously, all of our systems ran around the clock - both critical production systems and small, special test systems that are rarely actually used. And for the latter, we talked about this in the hackathon: Can't we automatically switch them off at night and on weekends? This saves a few euros here and there and perhaps even protects the environment a little.

This is how the Sandman was born, the system which, from now on, puts our non-essential systems to sleep regularly:

The whole construct consists of a small web application, which can be seen above, and scripts, which can be easily rolled out on all systems via our configuration management and seamlessly integrated into our existing local snapshots, offsite backups and monitoring systems. This ensures that a system does not sleep when it is supposed to make backups. Or that the monitoring doesn't work just because a system goes to sleep regularly.

Through this tight integration into our infrastructure, we are unfortunately unable to provide the Sandmann as an open source solution. But as I said, it is a hackathon project: Three people needed less than 24 hours to develop the current system - and therefore it should be possible for other companies to achieve a similar result with manageable effort, under the right conditions.

Flexible and customizable configuration

An important basis of our solution is configuration management in the form of BundleWrap. Here each system's configuration is saved in its entirety and traceable. Changes to many systems are very simple. If a system or an entire group is to go to sleep, we simply add this metadata to the rest of the configuration:

'sandmann': {
    'up': {
        '45 6 * * mon-fri',
    },
    'down': {
        '15 19 * * mon-fri',
    },
},

Here, in the style of a classic Unix cron job, the information is stored that the system should wake up every weekday at 6:45am UTC and go back to sleep at 7:15pm. So at night and on weekends, the virtual machine is off and costs practically nothing. After this configuration has been stored in Git, you only need to call bw apply to roll out the configuration to the target system. But what happens then?

How the Sandman works

The system constantly checks whether it is now in a time window in which it should actually sleep. As soon as this is the case, a backup is started if necessary. When this is done, the Sandmann web application is contacted via API and a wake-up call is created. The Sandman then enters a downtime period in the Icinga monitoring system. If all this has worked without errors, the virtual machine can now simply shut down itself, so that all services are still terminated properly.

At wake-up time, the Sandman simply contacts the API of the respective cloud provider and starts the corresponding VM. Alternatively, all colleagues also have the option of pressing the WAKE NOW button visible at the top. So a system can also be restarted before its normal wake-up time, for example if someone wants to work on it at the weekend.

Just over a month after the hackathon, we have configured a number of non-critical test systems and similar applications to use the Sandman, saving a solid 551 euros within 30 days. If we do not add any more systems, we will save a little over 8,300 euros in the first year. Not bad at all. In any case, we expect our next team event to be generously financed. 😉

Do you want to work on technical solutions with our system administrators?

Maybe you'll already be part of our next Hackathon project! We are currently looking for sysadmins and colleagues for our IT service management. You can find all information about our open positions and background information about working at //SEIBERT/MEDIA in our job portal (in German). Or just drop by: Every third Friday of the month we invite interested people to our Open Office (information in German). We'd love to meet you!

Lesen Sie diese Seite auf Deutsch

Further information

Expansion of //SEIBERT/MEDIA’s free community Wi-Fi infrastructure: More free Wi-Fi for the Wiesbaden city centre
IT log book: What are "CPU lock-ups"?
Agile Organization: Hackathons at //SEIBERT/MEDIA

More about this Creative Commons license

Forget Less and Ensure Quality with didit Checklists for Atlassian Cloud Forget Less and Ensure Quality with didit Checklists for Atlassian Cloud Forget Less and Ensure Quality with didit Checklists for Atlassian Cloud

Leave a Reply