Dynamic Scaling of Jenkins in the Cloud

You’ve reached an archived Flexera blog post that may be out of date. Please visit the blog homepage for the most current posts.

This article is the third in a series describing how the development team at RightScale does continuous integration (CI) and continuous delivery (CD) in the cloud. Previous articles provided an overview of CI/CD in the cloud and described how we do agile regression testing in the cloud at RightScale.
In this article, I will describe how the RightScale development organization scales Jenkins using RightScale to automate the scaling up of Jenkins slaves to ensure that we can continuously integrate our code commits.

The Problem: Many Developers + Many Changes = Jenkins Bottleneck

At RightScale we have dozens of developers working on the RightScale platform. As each of these developers commits changes to the code repository, our CI process kicks in to automate integration and testing of the changes. However, when many changes are being made each day, this can create a bottleneck in the Jenkins environment.  

Our Solution: Manage Jenkins with RightScale

Jenkins is designed with a master/slave architecture. To process more Jenkins jobs, you need to add more Jenkins slaves. However, we didn’t want to provision a lot of Jenkins slaves that would sit idle when the demand for CI was low. We wanted to go further and dynamically scale up the number of Jenkins slaves based on the size of the Jenkins queues.

We created a RightScale ServerTemplate™ to deploy Jenkins in the cloud along with a RightScale auto-scaling server array.  This Jenkins environment included a RightLink™ agent that monitored the size of the Jenkins queue and triggered RightScale to auto-scale the number of Jenkins slaves when the queue got too long. As the number of Jenkins slaves increased, the Jenkins queue could be processed more quickly. Similarly, when the queue got shorter, RightScale would decrease the number of Jenkins slaves.

The number of Jenkins slaves we use ranges from 1 to 6 depending on rate of new commits entering the build queue and the duration of the individual CI jobs. We have experimented with changing the server array scaling sensitivity on weekends and outside of business hours, but have discovered that it’s better to optimize the boot time for new slaves than to try and predict when developers will have work to test!

Bottom Line: Speed Development, Lower Costs

By dynamically scaling Jenkins, we are able to ensure fast turnaround on CI and testing so that developers can get feedback more quickly and ultimately deliver features to our customers more quickly. In addition, we are able to right-size our Jenkins slaves and therefore use only the cloud resources that we need at any point in time. As a result, we can reduce our cloud costs by eliminating overprovisioning and overspending.