Any successful website is really an army of computers working together as a unit. No single 'computer' can handle the load sites like face-book or twitter.com generate. However, if you overload the term 'computer' to mean lots of computers acting as one ... possibilities open up.
As an analogy, consider the movie '300' [Fox]: the small army worked efficiently and smartly together to withstand the big bad Persians. They worked together efficiently as a unit and were able to handle the 'load' quite well. Excellent movie by the way, I highly recommend it.
Back to computers, in theory, if a web-hosting structure is designed right, it can be housed on any standard desktop PC. One would simply need to add more PC's scale the system.
Any semi-advanced website will need these basic components:
- some data to display
- a way to store that data
- a way to analyze (search) the data
- and in some cases, create a way to get the data automatically
Some advanced components to building a web site
Here, I will briefly talk about a few ways to create smart sites. Setting everything up and the initial learning curve is very steep and challenging (not to mention frustrating), but the rewards are definitely worth the effort here.
- DISPLAY the web page
- creating websites by hand is tedious (and arguably boring)
- clearsilver however, definitely makes creating templates for the site a little interesting. The templates (.cst) work together with data files (.hdf) to display the pages that are visible to the user.
- You can create the .hdf files by hand
- Programmatically (aka the fun way)
Sooner or later, your ... big ... tables will grow ... really big and it's going to take a long time going from one end to the other.
You'll need to store this stuff smartly so that you can find it fast. MapReduce is the way to go there. Check out a project here that incorporates both worlds.
In a nutshell, all the fancy papers mentioned above boil down to the following story.
- You have some sort of business.
- You generate some sort of data with your business. (sales receipts, customers, etc.)
- You store all this stuff somewhere (your garage)
- Eventually the stuff becomes so large that finding anything can be compared to the needle in a haystack problem ... except in this case its a gazillion haystacks
- You create a way to find the stuff fast no matter how big the haystacks get (a big metal detector)
Conclusion:
Sites like Facebook.com, the newyorktimes.com, and others like them receive millions of hits a day. All these millions of people are doing millions of things. The customer traffic can grow exponentially very fast. Memory however, of any computer system(s), whether an army or a single machine, will eventually exhaust its resources. Anticipating this, we utilize the space we do have effectively and efficiently using intelligent algorithms which set up pre-computed environments (indexes) that predict and generate possible future conclusions by traversing down and exhausting all coverage paths. All this results in super-fast response-times from standard pc-servers, which act as efficiently as expensive commercial servers for a fraction of the price.
It's like placing a free supercharged viper engine in a pinto.
In conclusion: the piggy who invested the initial time to get it right the first time withstood the storm from the big bad wolf.
