Overview
Features
Quick Start
Usage Example
License

ejIndex

org.exodelta.j2.index

Full Text Indexing Service for JBoss

hosted by:

SourceForge.net Logo

Advanced
Filters
About

sourceforge project page


Overview:

ejIndex is a full-text indexing and search service implemented as a JBoss MBean service. It uses the Apache/Lucene index engine to provide very fast,efficient and stable text indexing/search facilities. ejIndex wraps the Lucene facilitites into a robust service implementation with thread pooling, queueing and access synchronization - with concurrent interfaces for Indexing, Search and Management. Additionally some text filters are included to extract searchable text from common document formats such as MSWord, MSExcel, HTML, XML, etc (with some help from other projects). 


Features:


Quick Start (for the impatient like me):


Example Usage (SimpleTest.java in distribution file):

public class SimpleTest
{

     public static void main ( String [] args ) throws Exception
     {
         Hashtable props = new Hashtable ();
         props . put ( InitialContext . INITIAL_CONTEXT_FACTORY , "org.jnp.interfaces.NamingContextFactory" );
         props . put ( InitialContext . PROVIDER_URL , "jnp://127.0.0.1:1099" );
         InitialContext context = new InitialContext ( props );

         IndexServiceHome ixHome = ( IndexServiceHome ) context . lookup ( "exodelta/IndexService" );
         IndexService ixService = ixHome . create ();
        
         IndexRequest request = new IndexRequest ( "jboss.org/index.html" ); // sets unique ID
         request . setDocumentURL ( new URL ( "http://jboss.org/index.html" ));

         // these properties are defined in schema defs..
         request . addProperty ( "title" , "jboss home" );
         request . addProperty ( "author" , "someone-at-jboss" );
         request . addProperty ( "rating" , new Float ( 9.5 ));
         request . addProperty ( "dateCreated" , new Date ());
        
         request . setMimeType ( "text/html" );
        
         ixService . addItem ( request );
        
         // allow some time to fetch & index doc
         Thread . sleep ( 5000 );      // we wouldnt normally need this
        
         SearchServiceHome ssHome = ( SearchServiceHome ) context . lookup ( "exodelta/SearchService" );
         SearchService ssService = ssHome . create ();
        
         SearchRequest search = new SearchRequest ();
        
         search . setColumns ( "id,title,author,rating,datecreated, summary" );     
         search . setQuery ( "Open Source" );      // see Apache docs for query specs
        
         SearchResults results = ssService . executeSearch ( search );
        
         while ( results . moveNext ())
         {
             String id = ( String ) results . getValue ( "id" );
             String title = ( String ) results . getValue ( "title" );
             String author = ( String ) results . getValue ( "author" );
             Float rating = ( Float ) results . getValue ( "rating" );
             Date dateAdded = ( Date ) results . getValue ( "datecreated" );
             String summary = ( String ) results . getValue ( "summary" );
            
             System . out . println ( "id: " + id + ", title: " + title + ", author: " + author
                                 + ", rating: " + rating + ", dateAdded: " + dateAdded );
             System . out . println ( "summary: " + summary );
         }
        
         ixService . removeItem ( "jboss.org/index.html" );
         ixService . remove ();
         ssService . remove ();
     }
}


License:

This software is made freely available under the terms and conditions of the GNU Lesser General Public License (LGPL). For details of the terms of this license, please refer to http://www.gnu.org/licenses/licenses.html#LGPL.


Advanced Configuration:

There are many configuration options to allow fine-tuning of the system - too many to detail here. For an example of a default configuration, you can take a look at the standard ejindex-service.xml file in html format here:


Filters:

In order to extract text from documents that is suitable for indexing, you need to use an appropriate ContentFilter. Filters are ethier implemented in Java, or as an external application that can extract the text and write it to stdout. The different filters are mapped to specific mime-types or file-extensions by specifying the mapping in the filtermappings section ot the ejindex-service.xml file.

ejIndex comes with some filters for common document formats, or you can implement your own.

The standard filters currently provided are:


About:

This project came about for different reasons. Firstly I wanted to write something to get more experience with j2ee app servers, and jboss in particular. I have been working on information/document management applications for quite some time, and this is something I wished I had before. I also wanted to contribute something to the OS community that hopefuly others might find useful. If you have any comments or suggestions (good or otherwise), please let me know via the forums or by email : andy at exodelta.com.


Copyright ©2003 Andy Scholz.