that Sam guy

Upgrade to Sphinx and PBXT for ZuFeed.com

ZuFeedAlthough ZuFeed has been up and running for almost 2 years now, those who use it regularly (especially me) will have noticed that sometimes searches and collections time out and not load at all. Other times a collection may take 30 seconds to load.

Never fear! A solution is at hand!

What went wrong

The cause of these problems has been the architecture running behind ZuFeed. Although I designed much of the backend to scale greatly, history now shows that one major decision was incorrect… well at least is now incorrect for a site holding more than 3.3 million articles.

As ZuFeed uses MySQL as a backend, and fulltext searching is required, the original decision was to hold all the data in MyISAM tables and search using the built-in fulltext search indexes. As ZuFeed has grown searches have become a lot slower and new articles are added only seconds apart. This is due to two issues with using MyISAM and one coding error on my part.

  1. Poor scaling of MyISAM fulltext searching (for a database of 7GB+ in size)
  2. Constant MyISAM table level locking for each new article entry
  3. A missing index for the top searches

The Solution

PrimeBase XT (PBST)The third problem was easily solved by identifying the needed index and adding it to the database. Result? The search summary page now shows up without delays.

Solving the second problem required changing the storage engine to one that used row-level locking instead of table-level locking. The easiest options available were InnoDB and PBXT. Unlike InnoDB, PBXT is not included by default in MySQL installations and is instead available as a plugin. MariaDB (a fork of MySQL) however does include PBXT.

Both InnoDB and PBXT are robust, tranactional and ACID compliant storage engines. However PBXT is also log based, which allows for fast speed in a large variety of scenarios. As I have been using PBXT for a while now in another project, I decided to switch the tables to PBXT.

Since PBXT (and InnoDB) for that matter do not include fulltext searching, I needed a replacement search engine. I decided to use another open source project with a simple setup and very fast search speeds and easy integration with MySQL and php – Sphinx.

The Implementation

I attempted to have as little down time as possible with the testing and then implementation of the new systems. Overall it went something like this:

Sphinx - Open Source Search ServerPart 1 – Sphinx Search Engine

  • Add a new auto incremented unique id of type integer to the databases (as required for Sphinx)
  • Set up Sphinx to use two indexes – one main and one delta for updates to allow shorter re-indexing times (although not quite realtime)
  • Schedule delta index re-indexing every 60 seconds via a cron job.
  • Run initial indexing of article database – Warning! This can take long time!
  • Test Sphinx vs MySQL fulltext results via php.

The results showed significantly faster searches using Sphinx over the MySQL in-built fulltext search. Sphinx also allowed more information to be searched. I was now able to include summary, author and permalink url fields to the search without penalty.

Part 2 – Integrate Sphinx into ZuFeed

  • Change php classes to use Sphinx instead of MySQL fulltext searching
  • Add unique id conversion functions between new and old id’s
  • Maintain the same API to avoid changing other code
  • Test to make sure searching still works in ZuFeed

Part 3 – MyISAM to PBXT Conversion

  • Remove fulltext indexes from each table
  • Convert each table to use PBXT

When converting a table from one storage engine to another, keep in mind that ALL data has to be copied to a new temporary table first, then renamed to remove the original. As such this takes a long time for multi-gigabytes tables.

Conclusion

And there you have it.

ZuFeed is now quicker at searching, responds to new requests faster, no longer times out on loading, and also produces more relevant results.


Categorised as: Servers, Tech, ZuFeed


Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>