Declarative Jenkinsfiles: Dynamic Parallel Stages

Been working on replacing a crusty old Jenkins job recently that’s used for scaling up a load-testing environment. Currently, it’s a good ol’ handcranked freestyle job with a 300-line bash script pasted into the Execute Shell field. When it needs debugged or changed it’s fun for all the family.

I’ve wanted to rewrite this for a while. Right now, the script has a bunch of configuration at the top of the file which dictates how many instances to launch, what sizes of instances to use, etc. And it’s pretty repetitive:

# config for standard load tests
SERVICEONE_NORMALLOAD_INSTANCETYPE=c5.xlarge
SERVICEONE_NORMALLOAD_MINSIZE=3
SERVICEONE_NORMALLOAD_MAXSIZE=3
SERVICETWO_NORMALLOAD_INSTANCETYPE=c5.large
SERVICETWO_NORMALLOAD_MINSIZE=6
SERVICETWO_NORMALLOAD_MAXSIZE=6

# config for high load tests
SERVICEONE_HIGHLOAD_INSTANCETYPE=c5.xlarge
SERVICEONE_HIGHLOAD_MINSIZE=6
SERVICEONE_HIGHLOAD_MAXSIZE=6
SERVICETWO_HIGHLOAD_INSTANCETYPE=c5.large
SERVICETWO_HIGHLOAD_MINSIZE=9
SERVICETWO_HIGHLOAD_MAXSIZE=9

if [[ $SCALING_LEVEL = 'NORMAL' ]] ; then
    # perform scaling actions with the first set of config...
    ...

elif [[ $SCALING_LEVEL = 'HIGH' ]] ; then
    # perform scaling actions with the second set of config...
    ...

fi

We don’t have just two services, of course. Or just two load settings. So you can see how this script gets pretty gnarly, pretty quickly. Adding a new service? Sure hope you got all the variable names right. Something not working? Better step through every single command to see what’s not as you expected.

Additionally, this script works through every service sequentially. This increases the time to scale the environment drastically if you have a bunch of services. The total time taken is the product of the number of services you have, and how long it takes to scale a service. Worse if one takes longer than another. There are no actions performed in parallel.

How could I make this better? For config, I envisioned something like this:

def SERVICES = [
    "service-1": [
        "NORMAL": [ 
            "InstanceType": "c5.xlarge", 
            "MinSize": 3, 
            "MaxSize": 3
        ],
        "HIGH": [
            "InstanceType": "c5.xlarge", 
            "MinSize": 6, 
            "MaxSize": 6
        ]
    ],
    "service-2": [
        "NORMAL": [ 
            "InstanceType": "c5.large", 
            "MinSize": 6, 
            "MaxSize": 6
        ],
        "HIGH": [
            "InstanceType": "c5.large", 
            "MinSize": 9, 
            "MaxSize": 9
        ]
    ]
]

Should be achievable in a Jenkins Declarative Pipeline, right? It’s easy enough to define a list of nested maps. By doing this, I can set all of my configuration values at the top of my Jenkinsfile, once per service. And people can come along and easily see what’s set to what. Great.

It got me thinking though. Every service scales up in the same way. We update a Launch Configuration and an Autoscaling Group resource in a CloudFormation Stack. Do we have to do this sequentially? Can we kick off each service’s scaling job in parallel to save time?

I ended up doing just this, implementing the parallelism with Parallel Stages in my Jenkinsfile. However, instead of writing out each stage individually, I wrote a function that generated one stage per item in my configuration map. Then, I could use the parallel directive to kick off every stage at once, at the same time.

My Jenkinsfile looked something like this:

// set up a config map as per the above snippet
def SERVICES = [ ...see above... ] 

// generate a pipeline stage for each item in the config map
def parallelStagesMap = SERVICES.collectEntries {
    [ "${it}": generateStage(it) ]
}

// here's how we generate identical stages
def generateStage(service) {
    def service_name = service.key
    return {
        stage("Service: ${service_name}") {
            size = service.value[env.LEVEL]['InstanceType']
            max = service.value[env.LEVEL]['MaxSize']
            min = service.value[env.LEVEL]['MinSize']

            // insert your scaling commands in here
            sh( script: """
                    echo "Action: ${env.LEVEL}"    && \
                    echo "Service: ${service_name}" && \
                    echo "InstanceType: ${size}" && \
                    echo "MaxSize: ${max}" && \
                    echo "MinSize: ${min}"
                """
            )
        }
    }
}

pipeline {
    agent any

    parameters {
        choice(
            name: 'LEVEL',
            description: 'What level do you want to scale to?',
            choices: [
                'NORMAL',
                'HIGH'
            ]
        )
    }
    stages {
        stage('Scale services') {
            steps {
                script {
                    // kick off all the stages we generated
                    parallel parallelStagesMap
                }
            }
        }
    }
}

When we run the job, it works and looks great:

Each item in our configuration hash has its own stage, executed in parallel.
Or if you prefer the fancier BlueOcean pipeline view…
The logs for each stage are pretty useful too.

This method is especially useful if one particular service has a problem scaling up — the error will show clearly on the stage related to that service, not halfway down a shell script somewhere that you have to dig into the logs of.

The configuration hash at the top also serves as self-documentation for our scaling levels. How many levels do we scale to? Oh, there’s two levels defined in the hash. How many services do we scale? They’re all there, unobscured and easy to see. No hidden settings, or secret numbers.

When our scaling levels change due to a projected increase in traffic or otherwise, we’re now one Pull Request away from having the updated levels ready-to-go — peer reviewed, merged once. New service launched? Cool, raise a PR. No more need to hunt through multiple Jenkins jobs, wondering which one is live and which one isn’t. The next run of the job will pull in the latest Jenkinsfile, along with the latest config.

This also opens us up to being able to schedule the scaling up and scaling down of our load testing environment using the Parameterized Scheduler Plugin and the ubiquitous cron syntax. In the pipeline section of our Jenkinsfile, we can set something like this:

triggers {
    parameterizedCron('''
        # scale up in the morning, scale down at night (weekdays)
        H 6 * * 1-5 % LEVEL=NORMAL        
        H 18 * * 1-5 % LEVEL=ZERO
       ''')
    }

This takes the legwork out of manually making sure the environment is available each morning and ensures we’re saving as much as we can on instance costs during evenings and weekends. We also don’t have to mess around with AutoScalingGroup ScheduledActions, which is nice.

You can see some extremely abstract versions of Jenkinsfiles I’ve been working on recently at my GitHub, recorded because I realise it’s probably not the last time I’ll need to write something like this.

Happy Jenkinsfile-ing!