Rebuilding a Lucene Cache

[ad name=”breit”]

In a current project we are using an Apache Lucene Cache to increase search performance over an aggregation of data drawn from a SQL database. Everything works fine so far. Performance is really great.

In a first approach we were creating local instances of IndexReader and IndexSearcher with method scope, but the performance hints in the Lucene Wiki advise to keep them open and share across threads. Both are thread safe.

After doing that we notice that the cache directory was ever growing and never got purged. The reason is that any reader or searcher that are open while rebuilding the cache, keep their data. That is intentional to allow searches on previous versions of the cache or simply to allow rebuilding the cache without blocking concurrent searches.

The old reader and searcher are also still alive after calling reader.reopen(). I looked around for solutions, but did not find a precise working example. Some even suggested to enumerate the files in the cache directory and delete them. That simply doesn’t work, because they are locked. At least in my runtime environment (WebSphere 6.1) they are locked.

Therefore the old reader and searcher must be explicitly closed. There is still one problem, that is perhaps WebSphere-specific, perhaps not. After republishing the web application, the existing cache files don’t get removed. Only those created after republishing are cleaned. It seems the owner of the aforementioned write locks is not any longer the current thread and so Lucene cannot clean up the files. However, it will do the cleanup after the next restart of the WebSphere instance and it’s JVM when the next IndexWriter is created, optimized and commited.

The following example works for me and keeps the old IndexSearcher available for queries while the cache is being updated. So, users can keep issuing queries. I should not that I use LuceneAccess as a singleton, so all references share the same IndexReader and IndexSearcher.

public class LuceneAccess
{
    private IndexReader reader = null;
    private IndexSearcher searcher = null;
 
    public static synchronized LuceneAccess getInstance()
    {
        if (luceneAccess == null)
        {
            luceneAccess = new LuceneAccess();
            return luceneAccess;
        } else
        {
            return luceneAccess;
        }
    }
 
    public synchronized void rebuildCache()
    {
        Directory directory = null;
        IndexWriter iwriter = null;
 
        try
        {
 
            File location = new File(...cache-directory...);
 
            directory = new SimpleFSDirectory(location);
            iwriter = new IndexWriter(directory, analyzer, true, MaxFieldLength.UNLIMITED);
            iwriter.deleteAll();
 
            addAllData(iwriter);
            iwriter.optimize();
            iwriter.commit();
 
            if (reader != null)
            {
                IndexReader newReader = reader.reopen();
                if (newReader != reader)
                {
                    IndexReader oldReader = reader;
                    IndexSearcher oldSearcher = searcher;
 
		    // TODO: protect the following 2 lines with semaphore
                    reader = newReader;
                    searcher = new IndexSearcher(reader);
 
                    oldSearcher.close();
                    oldReader.close();
                }
            }
        } catch (Exception ex)
        {
        ..error handling...
        } finally
        {
            if (iwriter != null)
            {
                try
                {
                    iwriter.close();
                } catch (Exception ex)
                {
	        ..error handling...
                }
            }
 
            if (directory != null)
            {
                try
                {
                    directory.close();
                } catch (Exception ex)
                {
	        ..error handling...
                }
            }
        }
    }
 
    private void addAllData(IndexWriter iwriter)
    {
    ...
    }
}

This article refers to Lucene version 3.0.3

This entry was posted in Apache Lucene, Java, Web development. Bookmark the permalink.

One Response to Rebuilding a Lucene Cache

  1. Marcos Negreiros says:

    Dear all,

    I want to show Latex equations and a chart in the same figure, that could be editable in winfig and compiled in Latex. Does anyone knows how to do this?

    Thanks in advance.

Leave a Reply

Your email address will not be published. Required fields are marked *