Google Ad

HABTM Search By Multiple Tags in CakePHP

Date: Sun, Jul 11th 2010, 15:53 Author: nick Views: 17328 Comments share

The Problem

A client came to me with an interesting request. They wanted the ability to search a songs database based on tags that defined that song. A similar problem would be faced with a real estate portal trying to search properties that have specific amenities, or posts with certain tags, etc.. etc...

The concept is fairly simple, but one that can be a bit tricky to do with a HABTM relationship in CakePHP. This isn't just a tricky problem in CakePHP, its a tricky problem on a database level.

Now searching for all songs based on a single tag is easy enough -- set the join, set the condition, everyone eat cake. However, searching based on multiple tags is a little more involved.

The setup and scope

Consider the following song list and their related tags (ie ... SongName (Tag1,Tag2,Tag3....)):
  • Song1 (Drum Intro, Guitar Solo, No Vocal)
  • Song2 (Drum Intro)
  • Song3 (Guitar Solo, No Vocal)
  • Song4 (Drum Intro, No Vocal)

Now, I want an easy way to get all songs based on an array of tags (ie ... Search(Tag1,Tag2....) = Result Set):
  • Search(Drum Intro, No Vocal) = Song1 and Song4
  • Search(Guitar Solo) = Song1, Song3
  • Search(Drum Intro, Guitar Solo) = no results
  • Search(Drum Intro, Guitar Solo, No Vocal) = Song1

Google is our friend

While approaching this problem I stumbled across a great post by teknoid and although it was close to what I wanted, it had a very large limitation -- Teknoid's solution would only return a result set of songs if the tags are an exact match -- no more, no less. I needed something more flexible. I needed a way to search for a set of songs that had at least the specific set of tags, but could possibly have more.

The Solution

So, with the help of the CakePHP's custom find methods I hit was off to work!

  1. //Song Model
  2. var $hasAndBelongsToMany = array('Tag');
  4. var $_findMethods = array('similar' => true);
  6. function _findSimilar($state, $query, $results = array()){
  7.   if($state == 'before' && isset($query['tags'])){
  8.     $query['joins'] = array(
  9.       array(
  10.         'table' => 'songs_tags',
  11.         'alias' => 'SongsTag',
  12.         'type' => 'inner',
  13.         'conditions' => array('SongsTag.song_id =')
  14.       ),
  15.       array(
  16.         'table' => 'tags',
  17.         'alias' => 'Tag',
  18.         'type' => 'inner',
  19.         'conditions' => array(
  20.           ' = SongsTag.tag_id'
  21.         )
  22.       )
  23.     );
  24.     if(!is_array($query['tags'])){
  25.       $query['tags'] = array($query['tags']); //make it an array.
  26.     }
  28.     $query['conditions'][''] = $query['tags'];
  30.     //set limit
  31.     $query['limit'] = isset($query['limit']) ? $query['limit'] : 5;
  33.     return $query;
  34.   }
  35.   elseif($state == 'after' && isset($query['tags'])){
  36.     $retval = array();
  37.     foreach($results as $key => $result){
  38.       $tags = Set::extract('/Tag/name', $result);
  39.       if(count($query['tags']) == count(array_intersect($tags, $query['tags']))){
  40.         $retval[] = $results[$key];
  41.       }
  42.     }
  43.     $results = $retval;
  44.   }
  45.   return $results;
  46. }

Breakdown ... prepare query

We're actually doing a few things here. There are two states to a custom find -- before and after. Lets break it down to bite size pieces.

  1. function _findSimilar($state, $query, $results = array()){
  2.     if($state == 'before' && isset($query['tags'])){

In the before state we prepare the incoming query. We only want to manipulate the query if we have a 'tags' key in our options array, otherwise just pass it onto find('all').

  1. $query['joins'] = array(
  2.   array(
  3.     'table' => 'songs_tags',
  4.     'alias' => 'SongsTag',
  5.     'type' => 'inner',
  6.     'conditions' => array('SongsTag.song_id =')
  7.   ),
  8.   array(
  9.     'table' => 'tags',
  10.     'alias' => 'Tag',
  11.     'type' => 'inner',
  12.     'conditions' => array(
  13.       ' = SongsTag.tag_id'
  14.     )
  15.   )
  16. );
  17. if(!is_array($query['tags'])){
  18.   $query['tags'] = array($query['tags']);
  19. }

We have to manually set our joins so we can use our friendly CakePHP condition syntax. We also make sure we're dealing with an array of tags (this allows us to pass in a single string if wanted).

  1. $query['conditions'][''] = $query['tags'];
  2. //set limit
  3. $query['limit'] = isset($query['limit']) ? $query['limit'] : 5;
  5. return $query;

We set the conditions with an IN operator, set the limit if we don't have one (5 in this case) and return the modified query. This query is rather greedy, as such, its best to limit it to maintain performance.

The end result in the before state is we're running a search on songs that have ANY of the tags we've included in our tags array. Later we'll take a look at the results and make sure we only return results that include all or more of the tags we specified.

Breakdown ... filter results

The second part of the custom find is the after state.

  1. elseif($state == 'after' && isset($query['tags'])){

So here again we make sure we're preforming a tag search, by checking if the tags key is set.

  1. $retval = array();
  2.     foreach($results as $key => $result){
  3.       $tags = Set::extract('/Tag/name', $result);
  4.       if(count($query['tags']) == count(array_intersect($tags, $query['tags']))){
  5.         $retval[] = $results[$key];
  6.       }
  7.     }
  8.   $results = $retval;
  9. }

It gets a little tricky here, we go through the result set that the query returned to us and extract all the names using CakePHP's built in Set class. $tags ends up being a one dimensional array just like our $options['tags']. We then count the $options['tags'] and make sure its equal to the array_intersect of $tags, and $options['tags']. Array_intersect is great here because it will take two arrays, compare then and give us back what is matching. The result is only songs that have at least the specified search tags (possibly more) will be given as a valid result in the returned set.

Example Usage

Example usage of this custom similar find would be:
  1. $result = $this->Song->find('similar', array(
  2.   'tags' => array('Drum Intro', 'No Vocal')
  3. )); //Song1 and Song4
  5. $result = $this->Song->find('similar', array(
  6.   'tags' => 'Guitar Solo'
  7. )); //Song1 and Song3
  9. $result = $this->Song->find('similar', array(
  10.   'tags' => array('Drum Intro', 'Guitar Solo')
  11. )); //Empty

Extra credit

A tip to paginate these results is to overwrite _findAll and _findCount, checking for 'tags', and doing the same thing. By doing so, all you'll have to do is pass in a 'tags' option key on an 'all' or a 'count' to get the behavior you want. This end result is all you need for pagination to work.

I hope you've enjoyed this tutorial.

Happy Baking,