The concept of a Minimal Viable Product has a lot of traction, and for good reason. It talks about building the smallest thing you can that will let you learn if the product is in the right direction. The same strategy can be applied within established products when working on new features. Sometimes it’s clear what a new feature should be. In which case, go build it. Other times, what is clear is the general direction where value lies, but the exact feature set isn’t clear to you or your customers.
This is where the MVF strategy comes in. You want to quickly deliver the smallest incremental capability and learn from your customers if you are on the right track or not.
Example: Load Balance a Build Farm
Shortly after releasing AnthillPro 3.0 in 2006, we began to look at distributing build load. As a central tool for a large enterprise, we knew that different servers in the build farm would have different capabilities and sizes. We had addressed the capabilities through filtering, but wanted to account for some boxes being faster than others. Builds are tricky performance wise as they tend to alternate between abusing I/O, memory and CPU. Ideally, we would track those capabilities and assign builds whose profile best matches the spare capacity of a server while leaving as much capacity for available for as many build types as possible. As the white-board filled up with stubs of algorithms, the development todo grew:
- Build native components for each supported platform to measure maximum disk, network, CPU and memory capacity, and consumption of each build.
- Bunches of database tables and analysis to track typical consumption
- Predict types of builds that will be required so we know which resources to conserve
- Build lots of user interface elements around all that stuff
- etc, etc, etc
The depressing thing was that we weren’t really sure that all that stuff would actually be terrible successful in optimizing build farm utilization.
So we started small, we added two configuration elements to the representation of a build server in the tool. 1) Max jobs at once 2) Throughput metric. The max jobs would put a hard cap on the maximum number of builds that would be run on the box. The throughput metric was an arbitrary integer the admin could assign to indicate how fast the box was. The selection algorithm was basically:
- Eliminate from consideration all build machines that don’t meet the criteria of the build (wrong platform, lacking a compiler, etc)
- Eliminate from consideration all build machines running with max jobs
- If no machines remain, queue the job until someone is available
- For the remaining machines estimate capacity by dividing the number of running jobs (plus one) by the throughput metric
- Assign the build job to the most available machine.
This was overly simple, and required no native code or changes to the agent.
We then released the capability (almost apologetically) as a new one to distribute load. Then we prepared for the deluge of complaints which would drive us towards refining the system later towards the grand ambitions on the whiteboard. The complaints didn’t come. The simple approach that required little engineering effort worked fine. We did end up adding some better handling of what happens when there was a capacity tie.
A key concern when using this strategy is to try to avoid breaking things. The strategy we used was likely to contribute towards an eventual fancier approach if that was needed. Sophisticated capacity measuring could take the place of an end user entering a number, but the rest of the algorithm and behavior would be barely impacted.
If we had done all the work we thought was necessary to ‘properly’ deliver the capability, we would have likely introduced bugs, limited our platform support, and generally made the product worse at great expense to ourselves. Many other times we did introduce minimal features, and get feedback quickly on what we should have made instead. That made the application better as well.
With an established product, considering “viability” of feature is key. Something you add must actually have a good chance of delivering value. Likewise, there needs to be a serious effort put forth to ensure that if the feature requires configuration, that configuration is likely to be valuable and true when you release the newer better version of that feature later. Finally, if your development team can’t frequently deliver features, this approach will struggle to work. In an annual release cycle, the minimum that is viable is considerably higher.