Read the notes on Semi-structured text parsing for a theoretic background to some of the design decisions.

Search API


Collection and indexing process

Extraction -> Documents -> Converters -> Splitters -> Index files -> Term extractions -> Terms database

Here is an updated image of the Drugle information handling process:



Script Programming

Drugle script programming

Presentation of results


Ranking search lists

Every search will produce a list of hits. Unless any other processing is done, they will be ordered in the way the search algorithm (xapian) originally found the hits. We wish to rank the sort hits according to some sort of intelligent ranking scheme.

An algorithm that implements an intelligent ranking scheme can base its ranking on one or more of the following:

  1. Data or property of the search hits. For this we can use properties of the index files above, such as the 'entries.<id>.scores' property.
  1. Properties of the search query itself such as the types of the search terms and/or specific combinations of search term types. This assumes we have some way of determining the type of a term. Eg. "type_of_term ('headache') = 'symptom'" or "type_of_term ('paracetamol') = 'drug'".
  1. User selected profile. Eg. when searching on drug terms, a 'doctor' may want to have search hits related to 'indications' (aspects/entries) to be higher ranked, while a 'patient' may want to have search hits related to 'side effects' (aspects/entries) to be higher ranked.

User search profiles

A user search profile is a formal specification consisting of a set of rules for ranking search hits based on properties of the search query and search hits as described above.

In the first iteration we will support three profiles: "normal user", "doctor" and "alphabetical". The latter is for testing purposes.