Google's Secret Sauce
DUBLIN -- As Google comes to market, analysts openly speculate how to crawl and index the Internet better and less expensively. Zenark have long speculated how to configure electronic robots to crawl more efficiently than the Google bot. Now some of the developers of the Open Directory Project have dropped interesting information into the mix. It makes interesting reading for anyone visiting Google's server farm in Citywest (Dublin, Ireland).
Stories about Gmail have got Topix.net thinking "about seemingly incremental features that are actually massively expensive for others to match." But is Google's platform actually cheaper to acquire and simpler to maintain than any other large-scale web service?
Topix.net bloggers have written before about "Google's snippet service, which required that they store the entire web in RAM. All so they could generate a slightly better page excerpt than other search engines. "
Google's rise to dominance is a case study in itself.
Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. What is this platform that Google is building? It's a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.
Google has a minimum of 100 racks stacked in Dublin. Friends in Silicon Valley have mentioned the Googleplex in total numbers more than 30,000 machines. Inside information puts each rack with 88 dual-CPU 2Ghz Intel Xeon servers holding 2 Gbytes of RAM and running an 80 GB hard disk. Across the Googleplex, you're looking at more than 2000 terabytes of hard drive space and more than 63,000 GB of RAM. You can store all the Internet crawled by Google in this copious amount of RAM.
Rich Skrenta -- "The secret source of Google's power" with some very lucid weblog comments on Google architecture.