All Posts in Apache

May 30, 2014 - Comments Off on Configuring Sunspot Solr Search Controller

Configuring Sunspot Solr Search Controller

Search is the compass of the internet. It guides us to the content that we are really looking for and helps avoid the stuff we don’t really care about. Or at least that’s how it is supposed to work. It turns out that beyond just the complexity of installing and configuring a search server, it can also be difficult to account for the various use cases of your search tool. Lets take a quick look at how The Mechanism engineers were able to tackle this challenge when building a restaurant search application for SafeFARE.

The good folks at foodallergy.org enlisted our services to build a restaurant search application that will allow users to find allergy-aware restaurants based on any combination of 9 criteria. Using the Ruby on Rails framework and Sunspot Solr (a Ruby DSL for the Lucene Apache Solr search server) we built this search app, and learned a few things on the way.

If a user searches for restaurants in a ZIP code should we only return restaurants within that ZIP code, or should we include restaurants from other nearby ZIP codes in our search results? And if we include other ZIP codes, how many other ZIP codes? How should we order the results? These and other similar questions helped up to come up with the structure of our search controller.

Figure 1.1

if params[:search].present?

@search = Restaurant.solr_search do

fulltext params[:restaurant_name] # runs a full text search of

with(:approved, :true) #facets approved restaurants

if params[:cuisine_search].present? #user also entered cuisine preference

any_of do

params[:cuisine_search].each do |tag|

with(:cuisines_name, tag) # facet by matching cuisines

end

end

end

if params[:address].present? || params[:city_search].present? || params[:state_search].present? || params[:zip_search].present?

#if any location fields are present, geocode that location

with(:location).in_radius(*Geocoder.coordinates(whereat), howfar)

#facet based on user given location,

end

order_by_geodist(:location,request.location.latitude,request.location.longitude)

@restaurants = @search.results

end

 

It took us about a week but we were finally able to come up with enough if statements to cover every one of the 362,880 possible combinations of search queries. Figure 1.1 is a small sampling of how we implement search when a user types in a restaurant name, cuisine preference, and restaurant location. First we search the solr index for whatever the user enters in the restaurant_name field, then cut that list down to only the approved restaurants, then we check to see if the user also entered a cuisine preference, if so we facet our list down to restaurants that match that cuisine, if the user did not enter a cuisine, we skip that step, then we check if the user entered a location that they would like to search like a city, or state, and we facet our list down to only restaurant’s in that area. Using this strategy we can create sort of a Venn diagram that allows us to drill down only to the information that we want, and point that result to the restaurant variable. To increase the functionality of the site, The Mechanism engineers implemented an IP lookup to automatically detect the IP address and location of the user, and order search results by how close the restaurant is to the user.

A second major challenge that many developers face when using a search server is deployment. In order to use solr in a production environment, you will need a Java app servlet like Tomcat or Jetty, and you will need an instance of Apache Solr. Developers may consider installing standalone versions of Tomcat and Solr Sunspot depending on their hardware capabilities, but sunspot comes bundled with a Jetty server which can be used in production by running the command RAILS_ENV=production rake sunspot:solr:start

And voila! we have implemented an advanced search tool that will help users find allergy-aware restaurants all across the nation and may even save somebody’s life one day.

Published by: Sharon Terry in The Programming Mechanism
Tags: , ,

July 12, 2012 - Comments Off on Adding CCK fields to Apachesolr documents in Drupal 7

Adding CCK fields to Apachesolr documents in Drupal 7

Drupal's core searching functionality is awesome in that it can be replaced by other search modules and even other search engines. We're using Apache Solr for its lightning fast response time and powerful indexing and faceting. In a nutshell, Solr is a standalone index/search engine to which Drupal sends its search queries. The real beauty is that the indexed results are all cached and served back to the Drupal site as XML documents ridiculously fast, much faster than Drupal's core search. The core search module has to hit the Database and fully load each node/entity for every matching result.

By default Apache Solr will index most of your content type's fields, however many CCK fields that you add will not be included in the default indexing. That is to say, if you add a checkbox, or text field to your content type you will have to explicitly direct Solr to add it to the index. Each piece of content that is indexed by Solr is processed and stored as a Solr document which holds all of the indexed fields as well as some Solr metadata. That is how Solr can return results so quickly, it is only sending the fields you require, not loading the entire object.
Here is some sample code showing how to add some custom fields to the Solr document. These two hooks are all you need to get started, just add them to a custom module and be sure to re-index afterwards to update the Solr index.


/**
* Implements hook_apachesolr_index_document_build().
*
* Add custom fields to the solr document
*/
function themech_solr_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
if($entity->type == 'publication') {
if (isset($entity->field_publication_author[$entity->language])) {
foreach($entity->field_publication_author[$entity->language] AS $id => $obj) {
if(isset($entity->field_publication_author[$entity->language][$id])) {
$document->setMultiValue('sm_field_publication_author', $entity->field_publication_author[$entity->language][$id]['entity']->name);
}
}
}
if(isset($entity->field_publication_attachment[$entity->language])) {
foreach($entity->field_publication_attachment[$entity->language] AS $id => $obj) {
$document->setMultiValue('sm_field_publication_attachment', $entity->field_publication_attachment[$entity->language][$id]['uri']);
}
}
if(isset($entity->field_publication_recommended[$entity->language])) {
$document->setMultiValue('is_field_publication_recommended', $entity->field_publication_recommended[$entity->language][0]['value']);
}
}
}

/**
* Implementation of hook_apachesolr_query_alter($query)
*
* Add the newly indexed fields from above to the query result.
*/
function themech_solr_apachesolr_query_alter($query) {
$query->addParams(array('fl' => array('sm_field_publication_author')));
$query->addParams(array('fl' => array('sm_field_publication_attachment')));
$query->addParams(array('fl' => array('sm_field_publication_recommended')));
}

In this example, we're adding any names that were selected in a selectlist of authors, a checkbox state, and a file attachment uri. The first function checks to see if the Publication node has data in certain fields then adds them to the $document object via SetMultiValue(). The fields are now stored in Solr, but as they were custom additions to the document, you have to specify them in the query to tell Solr to pull them back out with the rest of the document.

You can index anything you can put into a content type, and each content type can have specific fields indexed. With Solr you can create thumbnail gallery search results, or integrate with your commerce site to generate product category and price range searches, as well as tune the results based on custom weights and ratings. The possibilities are almost limitless. Maybe as many as a googol (1x10^100), or in Drupal terms... a Droogol. :+)

Links:
Apache Solr
Apachesolr Search Integration

Published by: chazcheadle in The Programming Mechanism
Tags: , ,

December 3, 2008 - 6 comments

Quick guide to Apache, Subversion and SvnX on Mac OS X

theMechanism has been using Subversion for just over a month now. The following is a quick guide to installing and working with Apache, Subversion and SvnX on Mac OS X:

Apache, Subversion and SvnX on Mac OS X

Apache is the local web server (used to view and test files in the working copy), Subversion is the version control software and SvnX is a GUI-client for Subversion.

Before we can use Apache, Subversion and SvnX, we need to check:

  1. Is Apache running?
  2. Is Subversion installed?
  3. Is SvnX installed?

Apache

To test Apache, fire up the browser of your choice and enter http://localhost/ into the address bar. If you do not see an "Unable to connect," "Can’t connect to server," or "Error! Connection closed by remote server" screen, Apache is running.

If it is not running, open System Preferences. Under "Internet & Network," choose "Sharing." Make sure "Web Sharing" is checked ("Personal Web Sharing" in Tiger).

Mac OS X Internet & Network Sharing window screen grab

Note that localhost is pointed at /Library/WebServer/Documents/.

Subversion

To see if Subversion is installed, launch Terminal (Applications | Utilities | Terminal). At the prompt, type:

svn --version

Some information on the version and build should appear. If not, Subversion is not installed.

Mac OS X Terminal screen grab showing results of running svn --version.

If Subversion is not installed, download and install the latest version from http://www.collab.net/downloads/community/

SvnX

To see if SvnX is installed, check for svnX.app in the Applications directory.

If it is not there, download and install the latest version from
http://www.lachoseinteractive.net/en/community/subversion/svnx/download/

Working with Subversion via SvnX

Standard Subversion workflow:

  1. Check out a working copy
  2. Make edits to the working copy
  3. Commit the edits to the repository

Checking out a repository

Screen grab of SvnX's Repositories window.

Launch SvnX and set the focus on the "Repositories" window.

Click the "+" button to add a repository.

  • Change the name to something descriptive.
  • Enter the path to the repository.
  • If required, enter your user name and password for the repository.

Once a repository has been added, it will appear in the top half of SvnX's Repositories window. Double-click on the repository to open it:

An open repository in SvnX.

Click the "svn checkout" button at the top of the window and navigate to a directory below /Library/WebServer/Documents. Click the open button and SvnX will download the repository to your local machine.

Close the current window and the SvnX "Repositories" window.

The Working Copy

Set the focus on the SvnX "Working Copies" window. Note that SvnX has added a working copy after check out. Change the name to match the descriptive name in the previous step.

Screen grab of the SvnX Working Copies window.

Double-click on the working copy in the top half of the "Working Copies" window. The main thing to be aware of here is the "Update" button. Click this to refresh your local working copy with any changes made to the repository by any other team member.

In Finder, navigate to the working copy, open index.html in your favorite HTML editor, make some small change and save index.html.

Switch back to the active working copy window and click the "Refresh" button at the top of the window. You’ll see that index.html is added with a "M" (modified).

Screen grab of an open working copy in SvnX.

If you're happy with the change and have previewed it to make sure everything is working, press the "Commit" button on the right side of the screen.

Add a meaningful commit message and press commit.

Version control principles

  1. Update often!
    1. Keep your working copy up-to-date by updating often.
  2. Commit early and often!
    1. Early and often means make atomic changes. Change one thing, test and then commit with comments.
  3. Never commit broken code!
  4. Jeffrey Barke is senior developer and information architect at theMechanism, a multimedia firm with offices in New York, London and Durban, South Africa.

    Published by: jeffreybarke in The Programming Mechanism
    Tags: , ,