Implementation

Profiles

Profile specifications

Profiles are implemented via Python files that are loaded as proper Python modules. These files are called profile specifications. They are located in:

/var/lib/drugle/profiles

The name of a profile specification file must match the value of the 'profile' argument sent by the web application via the Search profile drop down widget. For example the profile named profile-normal must have a profile specification file named profile-normal.py in the directory above.

The only requirements that exist on the contents of a profile specification file are:

  1. Must be a valid Python file.
  1. Must contain the feature sort (x, y).

The feature sort is used when comparing the search hits x and y to determine which has higher ranking.

The ranking

For every search request handled by the Drugle handler drugle_handler.py the appropriate profile specification is imported as a Python module and passed to drugle_text_search.py. This makes the appropriate sort function avaliable for sorting search hits. In drugle_text_search.py there is the function

def ranked_results (results, profile):
    """Resorted list of results ranked according to 'profile'."""
    assert results != None
    assert profile != None
    return sorted (results, profile.sort)

which sorts the serach hits results using the sort function of the current profile.

The sort function

The sort function will be passed to Python's built in function sorted and must behave as expected which means:

if x > y:
    return 1
elif x == y:
    return 0
else:
    return -1

Of course you don't need to compare x and y directly. You will typically chose some property of x and y, or some calculated value based on x and y!

The sort function will typically want to make use of properties in the index files for determining how to sort search hits. The evaluated contents of the index files is available via memcached (se below) where each index dictionary is avaliable via the key:

drugle.index.<Index_file_path>

An example:

drugle.index./var/lib/drugle/sources/emea/spc/index/de/H-363-PI-de.1.py

memcached for index files

Due to the great number of index files and the time it takes to read and evaluate them its is neccessary to preload the index files into memory for fast access. In a first solution the index files were read every time a search was performed, incurring a 45 second time penalty. Not fun!

We use memcached to remedy this. To use memcache you need to install:

ii  memcached                                  1.2.2-1                                 A high-performance memory object caching sys
ii  python-memcache                            1.40-1                                  pure python memcached client

To configure memcache edit:

/etc/memcache.conf

Since the EMEA index files are about 250 MB you need to increase the maximum memory size for memcached from the default 64 MB to say 512 MB or whatever is appropriate. Edit the following configuration parameter (note that lines beginning with # are comments!):

# Start with a cap of 64 megs of memory. It's reasonable, and the daemon default
# Note that the daemon will grow to this size, but does not start out holding this much
# memory
#-m 64
-m 512

Then restart the 'memcached' server daemon with:

/etc/init.d/memcached restart

To use the Python memcache module write:

mc = memcache.Client(['127.0.0.1:11211'], debug=0)
...
# Set a key/value pair
mc.set (key, value)
...
# Get a key/value pair
mc.get (key)

# Get stats from the (single) memcache server (that we are using):
server = mc.get_stats ()[0]
for key in server[1]:
    print key + ": " + server[1][key]