Google scale computing for the masses

Cloud computing isn’t a new term but until recently it had not really come of age. Amazon has to be credited with bringing it to the mainstream, with the launch of Amazon Web Services (AWS) 3 years ago. Google played catch up with the App Engine, but it’s not been able to gather as much steam as AWS. The technological genius of Amazon is a subject matter of another post, but suffice to say that they have changed computing forever.

AWS basically allows one to deploy computing services (such as storage, messaging, hosting and any software) on infrastructure managed and owned by Amazon. Since all the provisioning for new machine instances, storage etc are through API’s, a system that dynamically scales depending on the load can be engineered. And it’s all pay-as-you-go, so one only pays for what is used and the pricing is very reasonable. There are many case studies of companies saving huge sums after migrating their solutions to AWS.

Yesterday, they unveiled their Elastic Map Reduce service. Map Reduce is a Google invention, and it has been one of the pillars on which various Google services are built. It basically allows a task to be divided among multiple machines simultaneously and collates the results and returns it back. This means that thousands of computers can simultaneously work on a problem and return the results much faster compared to a single computer doing the same work. Google was kind enough to publish the paper on Map Reduce and the Open source community started building a system based on it.

Hadoop, from the Apache Software Foundation, came up as the most popular Map Reduce implementation and Amazon’s Elastic Map Reduce is also based on this.  Now one can make a job, boot up multiple machines and distribute the job to the machines. Elastic Map Reduce will manage all the machines and return the output of the job. All this from the comfort of wherever one is; well, even that’s not required, since even that can be automated.

So why is this so important? Take Google’s example. One of the reasons why they are so successful is because they have engineered solutions to difficult engineering problems – not something that everyone can do. With Elastic Map Reduce, the biggest stumbling block for anyone who wanted to compute huge datasets has been taken away – the hard part is already done. All one needs to ensure is that the job is something that can be equally distributed (not too easy everytime). And suddenly, there is a whole new avenue available to everyone – just because something will take too much computing power is no longer a deterrent. That is what is super cool.

Leave a Reply