June 20, 2012 - Comments Off on Google Tech Talk: Continuous Integration
Yesterday Angela and I left work a bit early to attend Google's latest tech talk in their NYC office on 15th Street. The topic was "Tools for Continuous Integration at Google Scale." Firstly, if you live in NYC and are reading this blog you should definitely try to attend these events hosted via Meetup. They always have great speakers in their lovely office cafeteria and the fellow attendees are always good for a bit of networking. You can see past talks (and I hope this one soon) here.
We here at the Mechanism are always looking to improve our development pipeline, especially when it comes to version control. Seeing how Google handles these problems was simply mind-blowing. John Micco walked through the overall setup at Google which allows their multitude of engineers to collaborate on projects easily with subversion, testing and deployment all bundled onto a master branch. Their model consumes huge amounts of resources, mainly due to the tests run on each submit as well as a live dependency list which determines which tests need to be run for every submit. This allows the system to promise a 90 minute return on every submit.
While Micco could share how the tool was setup, he had no idea how it was being used by individual teams. For example, some teams such as Google+ have extremely high turnover and submit rates, deploying a product update every few weeks! Others, such as the Google Core team take much longer and release less often for obvious reasons. Likewise, some teams force all changes to be tested locally before being sent to the network while others have no restrictions at all. This is the product of Google's team independence philosophy.
Obviously such a continuous integration system isn't suitable for every company. In fact only those with many engineers and dynamic product necessitate such tools. Yet there are many benefits that small companies can likewise exploit. For Google, the biggest plus of this system is the ability to see exactly who broke what so there's no need to untangle who's bad commit broke the code. This emerges from Google's desire to, in Micco's words, avoid "tribalization" of knowledge in the company while still allowing teams to act freely. This is key for any company and is a tenant of coding culture in general. Code should be clear and concise such that any other developer can easily figure out how its working and how to fix it. If only a small tribe of people (or even an individual) knows how to fix something easily, it hurts the company as a whole in the long run as that group changes or leaves.
Micco was keen to point out that all the Google teams using the system loved it and have become more and more addicted to it. Even the mention of the system going down for maintenance is met with horror. Yet there is a fundamental problem to such a system: as the number of users, tests and commits increases over time, the computing resources required escalate exponentially such that keeping ahead of demand is impossible! Yet now the engineers are hooked so Micco posed this problem: how do we optimize resource utilization while still being able to provide a quick turnaround?
Beyond optimizing expensive operations such as testing and dependency mapping, Micco's simplest suggestion was also the most likely: to impose quotas on development teams--then avoid the rationing meeting.