Apr 16th, 2006
Over the past week and a half, I have been in the process of reading up on developing for high traffic sites. It is very interesting that it only requires small optimizations here and there.
One thing I’ve found in common with all large sites and that should be pretty obvious, is a PHP compiler cache and some type of caching system. Some of the more popular ones are APC, Bware Afterburner, Turck MMCache, and the Zend Accelerator. PHP caching systems are easy to write and can speed up template calls a ton.
You can speed up your site by using output buffering. What this does is use writev() instead of write(). The write() calls were sent to apache as 4kb buffers, where as the writev() calls aren’t.
Another speed up can be accomplished by setting Apache’s SendBufferSize to PageSize. This allows the page to be handed over to the kernel, to be sent, without blocking.
To reduce bandwidth, you may want to look into gzipping the contents of your page. I have seen this shrink 80gb backups down by almost a third the size, except for when most of the contents were images. So you should gain a lot for this if you can afford the overhead of the operations to compress your data.
There are also some simple thinks you can do to speed up your code. To speed up database calls, only query for that data you need. No need to SELECT * when you only need to SELECT id. You should also only query a table once. Get the data you need from it and store it, don’t query it again.
If you are using PHP5 or higher, you may also want to use MySQLi. This new MySQL API is much faster than the old API and includes both a functional and OOP interface. Most applications can be easily converted to use it. Plus, you get support for prepared statements and bind statements.
Optimization can also be made during the design of your database. Make sure you realize the differences between MyISAM and the InnoDB storage engines. MyISAM is very efficient for either very high volume writes or reads, but has table level locking. InnoDB has non-locking reads and row level updates, plus high concurrency.
You may also want to cache query results if they are not expected to update as often. An example I often use for this is CMS systems. Why query all of the blocks you want on one page, when you can cache the results of the original query for 6 hours and have them quickly accessible? Plus, if the owner of the site changes them, just clear the cache…
Now, back to what I’ve been up to. Besides reading up on the above, I have also been messing with my new project for the last two weeks straight. It is amazing how many hacks IE5-IE6 requires. I know I’ve used at least four to get my new site to display correctly. While the thing worked in Firefox, Safari, Epiphany, Konqeror, Opera, and IE7 the _whole_ time.
I hope most people will like the design though. I have shown it to around 20 people on IRC, and only one person hasn’t liked it a lot. So I would say I’ve done pretty good considering this was the first site I had designed entirely in Photoshop and then converted it to CSS/XHTML by hand. The core work for this new project starts this week and will probably continue for the next few months to a year.
Reference for high volume PHP: http://www.oreillynet.com/onlamp/blog/2006/04/digg_phps_scalability_and_perf.html