Rebuilding a Lucene Cache

In a current project we are using an Apache Lucene Cache to increase search performance over an aggregation of data drawn from a SQL database. Everything works fine so far. Performance is really great.

In a first approach we were creating local instances of IndexReader and IndexSearcher with method scope, but the performance hints in the Lucene Wiki advise to keep them open and share across threads. Both are thread safe.

After doing that we notice that the cache directory was ever growing and never got purged. The reason is that any reader or searcher that are open while rebuilding the cache, keep their data. That is intentional to allow searches on previous versions of the cache or simply to allow rebuilding the cache without blocking concurrent searches.

The old reader and searcher are also still alive after calling reader.reopen(). I looked around for solutions, but did not find a precise working example. Some even suggested to enumerate the files in the cache directory and delete them. That simply doesn’t work, because they are locked. At least in my runtime environment (WebSphere 6.1) they are locked.

Therefore the old reader and searcher must be explicitly closed. There is still one problem, that is perhaps WebSphere-specific, perhaps not. After republishing the web application, the existing cache files don’t get removed. Only those created after republishing are cleaned. It seems the owner of the aforementioned write locks is not any longer the current thread and so Lucene cannot clean up the files. However, it will do the cleanup after the next restart of the WebSphere instance and it’s JVM when the next IndexWriter is created, optimized and commited.

The following example works for me and keeps the old IndexSearcher available for queries while the cache is being updated. So, users can keep issuing queries. I should not that I use LuceneAccess as a singleton, so all references share the same IndexReader and IndexSearcher.

public class LuceneAccess
{
    private IndexReader reader = null;
    private IndexSearcher searcher = null;
 
    public static synchronized LuceneAccess getInstance()
    {
        if (luceneAccess == null)
        {
            luceneAccess = new LuceneAccess();
            return luceneAccess;
        } else
        {
            return luceneAccess;
        }
    }
 
    public synchronized void rebuildCache()
    {
        Directory directory = null;
        IndexWriter iwriter = null;
 
        try
        {
 
            File location = new File(...cache-directory...);
 
            directory = new SimpleFSDirectory(location);
            iwriter = new IndexWriter(directory, analyzer, true, MaxFieldLength.UNLIMITED);
            iwriter.deleteAll();
 
            addAllData(iwriter);
            iwriter.optimize();
            iwriter.commit();
 
            if (reader != null)
            {
                IndexReader newReader = reader.reopen();
                if (newReader != reader)
                {
                    IndexReader oldReader = reader;
                    IndexSearcher oldSearcher = searcher;
 
		    // TODO: protect the following 2 lines with semaphore
                    reader = newReader;
                    searcher = new IndexSearcher(reader);
 
                    oldSearcher.close();
                    oldReader.close();
                }
            }
        } catch (Exception ex)
        {
        ..error handling...
        } finally
        {
            if (iwriter != null)
            {
                try
                {
                    iwriter.close();
                } catch (Exception ex)
                {
	        ..error handling...
                }
            }
 
            if (directory != null)
            {
                try
                {
                    directory.close();
                } catch (Exception ex)
                {
	        ..error handling...
                }
            }
        }
    }
 
    private void addAllData(IndexWriter iwriter)
    {
    ...
    }
}

This article refers to Lucene version 3.0.3

Posted in Apache Lucene, Java, Web development | 1 Comment

Detecting memory leaks in Qt applications with Visual Studio

Visual Studio has the ability to report memory leaks after running an application in debugging mode. This is generelly somehow satisfactory, but raises some problems when developing a Qt application. Qt allocates memory in it’s DLLs and this memory is released when the DLLs are unloaded. Unfortunately this is after Visual Studio reports the supposed leaks.

I’m not sure if this happens always or only in applications that link against MFC libraries. I read somewhere that the MFC cleanup code triggers the check and that this is unfortunately before 3rd patry DLLs can do their cleanup. Thus, the check falsely reports memory leaks.

In case of my application (WinFIG) about 1600 leaks are reported, although none of them is real. This has two annoying implications:

  • Ending the debugger is delayed until the dump is complete
  • It gets very hard to find the real memory leaks

Ok, so far so bad, but what to do? There is a way to replace the function that actually writes the leak info to the output console. I managed to replace this by a function that filters everything out that is not originated from my own code. To sort out, which leaks are from my code and which are from somewhere else, I need to mark “my” memory allocations. I do this with the following definition.

setDebugNew.h

#if defined(WIN32) && defined(_DEBUG)
  #define new DEBUG_NEW
#endif

Add this to all your code files. I placed this in a small include file that I always include as the last one in each CPP file.

Let’s assume you produce the following memory leak in main.cpp:

int main(int argc, char *argv[])
{
  char *foo = new char[100];
  return 0;
}

Now we get the following report:

.\main.cpp(2) : {271457} normal block at 0x020D6220, 100 bytes long.

Note that we can see the source file (main.cpp) name and line number (2).

Now I replace the default reporting function with something that only reports leaks that come with source file/line number info.

This is the header file reportingHook.h:

#pragma once
 
#if defined(WIN32)
  void setFilterDebugHook(void);
#endif

This is the body file reportingHook.cpp:

#if defined(WIN32)
 
#include <string.h>
#include "crtdbg.h"
 
#define FALSE   0
#define TRUE    1
 
_CRT_REPORT_HOOK prevHook;
 
int reportingHook(int reportType, char* userMessage, int* retVal)
{
  // This function is called several times for each memory leak.
  // Each time a part of the error message is supplied.
  // This holds number of subsequent detail messages after
  // a leak was reported
  const int numFollowupDebugMsgParts = 2;
  static bool ignoreMessage = false;
  static int debugMsgPartsCount = 0;
 
  // check if the memory leak reporting starts
  if ((strncmp(userMessage,"Detected memory leaks!\n", 10) == 0)
    || ignoreMessage)
  {
    // check if the memory leak reporting ends
    if (strncmp(userMessage,"Object dump complete.\n", 10) == 0)
    {
      _CrtSetReportHook(prevHook);
      ignoreMessage = false;
    } else
      ignoreMessage = true;
 
    // something from our own code?
    if(strstr(userMessage, ".cpp") == NULL)
    {
      if(debugMsgPartsCount++ < numFollowupDebugMsgParts)
        // give it back to _CrtDbgReport() to be printed to the console
        return FALSE;
      else
        return TRUE;  // ignore it
    } else
    {
      debugMsgPartsCount = 0;
      // give it back to _CrtDbgReport() to be printed to the console
      return FALSE;
    }
  } else
    // give it back to _CrtDbgReport() to be printed to the console
    return FALSE;
};
 
void setFilterDebugHook(void)
{
  //change the report function to only report memory leaks from program code
  prevHook = _CrtSetReportHook(reportingHook);
}
 
#endif

The function setFilterDebugHook must be called at program end:

#include ...
#include "reportingHook.h"
#include "setDebugNew.h"
 
int main(int argc, char *argv[])
{
  QApplication app(argc, argv);
  int result = app.exec();
 
#if defined(WIN32) && defined(_DEBUG)
  setFilterDebugHook();
#endif
 
  return result ;
}

That’s all! Now the memory leak report will only show those occurences that originate from our own source code. But don’t forget to add the #include "setDebugNew.h" to all your files or you won’t see your own memory leaks! I tested all this with Visual Studio 2008 and Qt 4.5.

I hope this helped.

Posted in MFC, Qt, Visual Studio | 15 Comments

deploying a web application with maven into apache tomcat

This article refers to Maven 2.1.0 and Apache Tomcat 6.0.18 on a Windows machine.

Auto-deploying a web application with Maven into Tomcat should be straightforward, but has some minor pitfalls. After I got it working today, I want to give a short summary of the required steps, since the “official documentation” leaves some open questions.

Let’s assume we want to deploy a web application with the name “foo” into Tomcat. To do automatic deployment of “foo” with Maven you have to enable the Tomcat targets. I’m using the codehaus.mojo Tomcat plug-in for that.

Open your pom.xml, locate the tag and add the following code as an additional plug-in definition:

<plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>tomcat-maven-plugin</artifactId>
        <configuration>
                <url>http://localhost:8080/manager</url>
	        <username>admin</username>
	        <password>admin</password>
                <path>/foo</path>
                <warFile>target/foor.war</warFile>
        </configuration>
</plugin>

The variables url, username and password must be set according to your Tomcat installation. The variable path ist the context root of the “foo” application and warFile is the location of your build target, wherever your build step creates the war file. I’m using a relative path here, because I call mvn from the project root directory.

For the automatic deployment of a war file you can use the follwoing targets:

  • mvn tomcat:deploy
  • mvn tomcat:redeploy
  • mvn tomcat:undeploy

Actually, the redeploy target is sufficient, because it can also deploy when there is no existing web application. But the actual deployment step, which uploads the jar to the Tomcat server needs a lot of heap memory, so I increase the heap size that is available to Maven. Add the following line at the beginning of your mvn.bat (after all the @REM comments):

MAVEN_OPTS=-Xmx256M

Use any other value if you feel 256M is too small or too big.

Now the configuration is theoretically done and

mvn tomcat:deploy

can be executed. But if you are using JSF libraries, it may happen that only the first deployment works and any subsequent deployment fails. The reason is that some of the JSF related jars get locked by Tomcat and the undeployment remains unfinished. The server has to be stopped and the files manually removed. It seems that the XML parser inside Tomcat reads some DTD files from the jar files and does not properly release them, so the jars remain being used, even after stopping the application. They can only be deleted after stopping the whole Tomcat server. As a workaround I wrote a script that wraps these steps:

call E:\Apache\apache-tomcat-6.0.18\bin\shutdown.bat
 
ping -n 10 localhost
 
rmdir E:\Apache\apache-tomcat-6.0.18\webapps\foo /S /Q
del E:\Apache\apache-tomcat-6.0.18\webapps\foo.war /Q
 
call E:\Apache\apache-tomcat-6.0.18\bin\startup.bat
 
mvn tomcat:redeploy

Save this snippet for instance as “deployfoo.bat”. If you wonder about the ping. This is, because the Windows CMD shell does not have a sleep or wait command, but it is important that the shutdown has finished before trying to delete the files. This would happen without it, becase shutdown.bat is called asynchonously, which is necessary, beause if the shutdown fails i.e. when Tomcat is not running, it would terminate the whole script, hence nothing gets deployed. That’s why we need a wait and Ping can do that.

That’s basically all. Now you can call “deployfoo.bat” for deploying the foo web application.

Posted in Java, Web development | 3 Comments

getting a local wordpress installation

I spent this evening installing wordpress locally to have an offline playground. I already had an Apache 2.2 web server. So, next I installed MySql 5.1, used the command line tool to create a database (create database wordpress;) and then tried to install WordPress. But there I stumbled across two problems:

First, trying to run the install.php resulted in the following error: "Your PHP installation appears to be missing the MySQL extension which is required by WordPress".

But I had properly configured the MySql extension:

  • uncomment the lines “extension=php_mysql.dll” and “extension=php_mysqli.dll” in php.ini
  • configure the correct path to C:\php528\ext (or wherever you installed php) in extension_dir in php.ini
  • copy libmysql.dll to to C:\WINDOWS\system32 (not nice, probably adding the correct path of the DLL location in the PHP directory to the windows PATH variable would work too)

But I still got the error message. The reason was that  php.ini has to be copied to C:\Windows.

Now I could run the install.php and everything installed, but trying to open the blog just gave me a directory listing. It took me some time to find out that the following line has to be modified in httpd.conf:

DirectoryIndex index.html index.shtml
to
DirectoryIndex index.html index.shtml index.php

This tells Apache that index.php is a proper index page. Now I have a local version of the blog for playing around with layout and styles etc. before applying any changes to my online blog.

Posted in Web development | Leave a comment

getting Qt

I’m in the process of transitioning WinFIG from Microsoft Foundation Classes (MFC) to the QT framework. There are various reasons:

  • It has the much nicer programming model
  • MFC seems almost abandoned by MS. There doesn’t seem to be much commitment to continue and really develop MFC now that .NET and C# are Microsofts favorite horses.
  • MFC user interface programming is a pain
  • Qt is platform independent
  • Qt supports anti-aliasing for nicer graphical rendering
  • Qt provides a more comprehensive widget set and it’s easier to create custom widgets.
  • I also hope to get some graphics performance gains possibly trough using the OpenGL paint device as an configuration option.

So far, porting the code has been a fairly smooth process, helped by the fact that the GDI+ API and the QPainter API are very similar. You create pens and brushes and define paths, really not a big difference so far. But there are also a few problems, one of which is something I didn’t really expect, since it’s a feature that is available in GDI+, but missing in Qt’s QPainter class.

It simply doesn’t support to exclude an area from clipping. It’s possible to include, but not to exclude. I wrote a posting in qtcentre.org, but I couldn’t get a solution there. I also talked to someone from Nokia recently at the Linux Day in Berlin, but the answer was that clipping is a thing to avoid and I would better try to do without. Not a very satisfying answer.

It looks like everybody wants to add eye-catching effects and similar stuff that impresses people at presentations, but the “boring” (nevertheless useful) basic features are drifting out of the focus. However, I don’t want to complain too much. Qt is really a great framework and it’s fairly comprehensive for the purpose of developing an application like WinFIG.

Posted in MFC, Qt | Leave a comment

getting started

I thought it would be nice to have a place where I could share some ideas, knowledge or whatever things I stumble across while working on WinFIG, but I will also write about more general things related to software development.

Posted in Uncategorized | Leave a comment