Dienstag, 28. Mai 2013

Memory leaks and memory management in Java applications

One of the more prominent features of the Java platform is its automatic memory management. Many people translate this feature erroneously into there are no memory leaks in Java. However, this is not the case and I am under the impression that modern Java frameworks and Java-based platforms, especially the Android platform, increasingly contradict this erroneous assumption. In order to get an impression on how memory leaks can occur on the Java platform, look at the following implementation of a stack:

class SimpleStack {

    private final Object[] objectPool = new Object[10];
    private int pointer = -1;

    public Object pop() {
        if(pointer < 0) {
            throw new IllegalStateException("no elements on stack");
        }
        return objectPool[pointer--];
    }

    public Object peek() {
        if(pointer < 0) {
            throw new IllegalStateException("no elements on stack");
        }
        return objectPool[pointer];

    }

    public void push(Object object) {
        if(pointer > 8) {
            throw new IllegalStateException("stack overflow");
        }
        objectPool[++pointer] = object;
    }
}

This stack implementation stores its content in form of an array and additionally manages an integer which points to the currently active stack cell. This implementation introduces a memory leak every time an element is popped off the top of the stack. More precisely, the stack keeps a reference to the top element in the array, even though it will not be used again. (Unless it is pushed onto the stack again what will cause the reference to be overridden with the exact same reference.) As a consequence, Java will not be able to garbage collect this object even after all other references to the object are released. Since the stack implementation does not allow direct access to the underlying object pool, this unreachable reference will prevent garbage collection of the referenced object, until a new element is pushed onto the same index of the stack.

Fortunately, this memory leak is easy to fix:

public Object pop() {
        if(pointer < 0) {
            throw new IllegalStateException("no elements on stack");
        }
        try {
            return objectPool[pointer];
        } finally {
            objectPool[pointer--] = null;
        }
    }

Of course, the implementation of a memory structure is not a very common task in day to day Java development. Therefore, let us look at a more common example of a Java memory leak. Such a leak is often introduced by the commonly used observer pattern:

class Observed {

    public interface Observer {
        void update();
    }

    private Collection<Observer> observers = new HashSet<Observer>();

    void addListener(Observer observer) {
        observers.add(observer);
    }

    void removeListener(Observer observer) {
        observers.remove(observer);
    }

}

This time, there exists a method that allows to directly remove a reference from the underlying object pool. As long as any registered observer gets unregistered after its use from the outside, there are no memory leaks to fear in this implementation. However, imagine a scenario where you or the user of your framework forget to deregister the observer after its use. Again, the observer will never be garbage collected because the observed keeps a reference to it. Even worse, without owning a reference to this now useless observer, it is impossible to remove the observer form the observed's object pool from the outside.

But also this potential memory leak has an easy fix which involves using weak references, a Java platform feature that I personally wished programmers would be more aware of. In a nutshell, weak references behave like normal references but do not prevent garbage collection. Thus, a weak reference can suddenly be found being null if there were no strong references remaining and the JVM performed a garbage collection. Using weak references, we can change the above code like this:

private Collection<Observer> observers = Collections.newSetFromMap(
        new WeakHashMap<Observer, Boolean>());

The WeakHashMap is a ready-made implementation of a map that wraps its keys with weak references. With this change, the observed will not prevent garbage collection of its observers. However, you should always indicate this behavior in your Java docs! It can be quite confusing, if users of your code want to register a permanent observer to your observant like a logging utility to which they do not plan to keep a reference to. For example, Android's OnSharedPreferencesChangeListener uses weak references to is listeners without documenting this feature. This can keep you up at night!

In the beginning of this blog entry, I suggested that many of today's frameworks require careful memory management by their users and I want to give at least two examples on this topic to explain this concern.

Android platform:

Programming for Android introduces a life cycle programming model to your core application classes. All in all, this means that you are not in control of creating and managing your own object instances of these classes but that they will instead by created by the Android OS for you whenever they are needed. (As for example if your application is supposed to show a certain screen.) In the same manner, Android will decide when it does not longer need a certain instance (as when your application's screen was closed by the user) and inform you about this removal by calling a specific life cycle method on the instance. If you however let a reference to this object slip away into some global context, the Android JVM will not be able to garbage collect this instance contrary to its intent. Since Android phones are usually rather restrained in memory and because Android's object creation and destruction routines can grow pretty wild even for simple apps, you have to take extra care to clean up your references.

Unfortunately, a reference to a core application class slips away quite easily. Can you spot the slipped reference in the following example?

class ExampleActivity extends Activity {

    @Override
    public void onCreate(Bundle bundle) {
        startService(new Intent(this, ExampleService.class).putExtra("mykey",
                new Serializable() {
                    public String getInfo() {
                        return "myinfo";
                    }
                }));
    }
}

If you thought, it the this reference in the intent's constructor, you are wrong. The intent only serves as a starting command to the service and will be removed after the service has started. Instead, the anonymous inner class will hold a reference to its enclosing class which is the ExampleActivity class. If the receiving ExampleService keeps a reference to the instance of this anonymous class, it will as a consequence also keep a reference to the ExampleActivity instance. Because of this, I can only suggest to Android developers to avoid the use of anonymous classes.

Web application frameworks (Wicket, in particular):

Web application frameworks usually store semi-permanent user data in sessions. Whatever you write into a session will usually stay in memory for an undetermined period of time. If you litter up your sessions while having a significant number of visitors, your servlet container's JVM will pack up sooner or later. An extreme example of needing to take extra care of your references is the Wicket framework: Wicket serializes any page a user visited in a versioned state. Oversimplified, this means that if one of your website's visitors clicks your welcome page ten times, Wicket will in its default configuration store ten serialized objects on your hard drive. This requires extra care because any references hold by a Wicket page object, will cause the references objects to be serialized together with the page. Look for example at this bad practice Wicket example:

class ExampleWelcomePage extends WebPage {

    private final List<People> peopleList;

    public ExampleWelcomePage (PageParameters pageParameters) {
        peopleList = new Service().getWorldPhonebook();
    }
}
By clicking your welcome page ten times, your user just stored ten copies of the world's phone book on your servers hard drive. Therefore, always use LoadableDetachableModels in your Wicket applications which will take care of the reference management for you.

Tracing memory leaks in Java applications can be tiresome and therefore, I want to name JProfiler as a useful (but unfortunately non-free) debugging tool. It allows you to browse through the insides of your Java running application in form of for example heap dumps. If memory leaks are a problem for your applications, I recommend to give JProfiler a shot. There is an evaluation license available.

For further reading: If you want to see another interesting occurrence of memory leaks when you are customizing class loaders, refer to the Zeroturnaround blog.

Freitag, 24. Mai 2013

Converting Microsoft DOC or DOCX files into PDF using Java without contortions

I will give you a heads up: There is no simple, well-performing solution using pure Java. To get an intuition for why this is the case, just try to open a DOC-formated file with a non-Microsoft text editor, usually Apache Open Office or Libre Office. If your file contains more than a few standard formated lines, you are likely to experience layout displacements. The same is true for the DOC-format's XML-based successor, the DOCX format.

Unfortunately, converting a file to PDF conforms to opening the DOC-file and printing it out into another file. Consequently, the resulting PDF file will contain the same layout displacements as the software you originally used to open the DOC-file. Of course, this does not only apply to Open Office: You would face the same difficulties (or probably even worse) if you read a DOC(X) file using any Java library offering such functionality.

Therefore, a fully functioning DOC(X) to PDF conversion will always require you to use Microsoft Word. Unfortunately, Microsoft Word does not offer command line switches for direct printing or PDF-conversion.

Recently, I was faced with this problem what lead me to implement the small workaround which I will introduce in the reminder of this blog entry. To begin with, you need a working installation of Microsoft Word 2007 or higher on your machine. If you are using Microsoft Word 2007, make sure that the PDF plugin is installed. Later versions of MS Word are already bundled with this plugin. Secondly, you need to make sure that you have the Windows Scripting Host installed on your computer. This is basically the case for any Windows operating system. The Windows Scripting Host allows us to run Visual Basic scripts as this one:

' See http://msdn2.microsoft.com/en-us/library/bb238158.aspx
Const wdFormatPDF = 17  ' PDF format. 
Const wdFormatXPS = 18  ' XPS format. 

Const WdDoNotSaveChanges = 0

Dim arguments
Set arguments = WScript.Arguments

' Make sure that there are one or two arguments
Function CheckUserArguments()
  If arguments.Unnamed.Count < 1 Or arguments.Unnamed.Count > 2 Then
    WScript.Echo "Use:"
    WScript.Echo "<script> input.doc"
    WScript.Echo "<script> input.doc output.pdf"
    WScript.Quit 1
  End If
End Function


' Transforms a doc to a pdf
Function DocToPdf( docInputFile, pdfOutputFile )

  Dim fileSystemObject
  Dim wordApplication
  Dim wordDocument
  Dim wordDocuments
  Dim baseFolder

  Set fileSystemObject = CreateObject("Scripting.FileSystemObject")
  Set wordApplication = CreateObject("Word.Application")
  Set wordDocuments = wordApplication.Documents

  docInputFile = fileSystemObject.GetAbsolutePathName(docInputFile)
  baseFolder = fileSystemObject.GetParentFolderName(docInputFile)

  If Len(pdfOutputFile) = 0 Then
    pdfOutputFile = fileSystemObject.GetBaseName(docInputFile) + ".pdf"
  End If

  If Len(fileSystemObject.GetParentFolderName(pdfOutputFile)) = 0 Then
    pdfOutputFile = baseFolder + "\" + pdfOutputFile
  End If

  ' Disable any potential macros of the word document.
  wordApplication.WordBasic.DisableAutoMacros

  Set wordDocument = wordDocuments.Open(docInputFile)

  ' See http://msdn2.microsoft.com/en-us/library/bb221597.aspx 
  wordDocument.SaveAs pdfOutputFile, wdFormatPDF

  wordDocument.Close WdDoNotSaveChanges
  wordApplication.Quit WdDoNotSaveChanges
  
  Set wordApplication = Nothing
  Set fileSystemObject = Nothing

End Function

' Execute script
Call CheckUserArguments()
If arguments.Unnamed.Count = 2 Then
 Call DocToPdf( arguments.Unnamed.Item(0), arguments.Unnamed.Item(1) )
Else
 Call DocToPdf( arguments.Unnamed.Item(0), "" )
End If

Set arguments = Nothing

Copy this script and save it on your machine. Name the file something like doc2pdf.vbs. I will at this point not go into the details of Visual Basic scripting since this blog is addressed to Java developers. In a nutshell, this scripts checks for the existence of two command line arguments. The first of these arguments represents the DOC(X) file to be converted. The second parameter is optional and represents the output file. If no such parameter can be found, the script will simply append .pdf to the DOC(X) file and save this output in the same directory. The conversion is achieved by calling Microsoft Word silently. There exist more advanced implementations of this functionality on the net.

You will now be able to call this script from a MS Windows console (cmd) by typing:

C:\example\doc2pdf.vbs C:\example\myfile.docx

After executing this script, you will find C:\example\myfile.docx.pdf on your machine. Make sure that this conversion works in order to confirm that your system is configured correctly.

But there is more bad news. You will not be able to call this script from Java directly. Attempting to run the script via Runtime.exec will result in an java.io.IOException. The reason for this exception can be found in its description:
Cannot run program "C:\example\doc2pdf.vbs": CreateProcess error=193, %1 is not a valid Win32 application
Apparently, Java cannot access the Microsoft Script Host and does therefore not recognize our script as a valid application. This requires us to apply another workaround: We will write a small bash script that executes the Visual Basic script for us. This script will look something like this:

@Echo off
pushd %~dp0
cscript C:\example\doc2pdf.vbs %1 %2

Save this file as doc2pdf.bat. Again, I will spare you the details of this short bash script but it generally will only execute the Visual Basic script and will additionally pass its first two command line arguments to it. (If there are any.) Try this script by typing

C:\example\doc2pdf C:\example\myfile.docx

into your command line and to see if your script is set up correctly. The advantage of this bash script over the Visual Basic implementation is that it can be called by Java:

try {
    String docToPdf = "C:\\example\\doc2pdf.bat";
    File docPath = new File(getClass().getResource("/mydocument.docx").getFile());
    File pdfPath = new File(docPath.getAbsolutePath() + ".pdf");
    String command = String.format("%s %s %s", docToPdf, docPath, pdfPath);
    Process process = Runtime.getRuntime().exec(command);
    // The next line is optional and will force the current Java 
    //thread to block until the script has finished its execution.
    process.waitFor();
} catch (IOException e) {
    e.printStackTrace();
} catch (InterruptedException e) {
    e.printStackTrace();
}

By calling Process.waitFor you can block your execution thread until the bash script has finished its execution and the PDF file was produced. Additionally, you will receive a status code as a return value which informs you whether the bash script has terminated correctly. The PDF file can be accessed by the variable pdfPath in the above script.

It remains disappointing  that this solution will most likely only run on Windows systems. However, you might get it going on Linux via Wine and winetricks. (Winetricks allows to install Visual Basic for the Windows Scripting Host by the parameter option wsh56vb.) Any feedback on such further experiments are appreciated.