Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Map Reduce implemented over HTTP (code.google.com)
70 points by bdotdub on July 3, 2008 | hide | past | favorite | 14 comments


I'm glad to see this is not a SOAP opera and has no XML b.s. of any kind in it. In fact you can see one YAML file in the sources.

Jeff Atwood knew something probably... http://www.codinghorror.com/blog/archives/001114.html


You're crediting Atwood with saying XML should go away? He is like 10 years late on making that suggestion.


I'm crediting him for suggesting YAML. To be honest, I heard the name before but never thought it's so nice.


I've been playing with GridGain lately. It's Java, but their API is really sweet. You basically implement a simple interface that defines the map/reduce operations and then your code will be copied to the cluster nodes through a peer-to-peer classloader.

In a recent test I validated a million image urls in less than a minute. In a small EC2 cluster running GridGain.

It's certainly worth looking into if you are interested in that kind of stuff.

http://www.gridgain.org


It's a shame there are no comments to this yet. If you're relatively new to programming (say, a college student), this is a really cool chunk of code to copy into notepad and figure out.

It really shows you how a tiny amount of code can do some pretty cool stuff.


The paper is a good read too: http://labs.google.com/papers/mapreduce.html


M/R is the interesting part. Making it work over HTTP is just a few lines of Perl.


And even M/R is just a framework for easily parallelizing "embarrassingly parallel" problems and incrementally aggregating the results (instead of in one step). A cool idea for sure, but not ground breaking -- these ideas are as old as parallel computing itself.


Not a groundbreaking idea, but definitely takes the pain out of creating your own framework :)


Perhaps a simple version could be implemented in "a few lines of Perl" but the super redundant distributed version like Google's would be a bit more.


can you show how it can be done in perl ?


Built for Google App Engine? I think it will kill your request if handling it takes more than 8 seconds. Not useful yet.


Then just make a lot of small requests that take less then 8 seconds. Maybe Google will allow background jobs.


Doesn't CouchDB already do this in some way?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: