Performance | JavaFX News, Demos and Insight // FX Experience

Sequences Performance Tip

Jonathan Giles — Thu, 08 Jul 2010 01:43:54 +0000

Sure, there are plenty of tips discussing how to use sequences in JavaFX, but the one I wanted to cover quickly today is regarding the fact that sequences are immutable. That is, once you create them, their content can’t change.

‘You lie!‘ I hear you all shout, running off to your IDE’s to show me the following fully legal and compiler-friendly code:

[jfx]
var seq = [ 1, 2, 3 ];
insert 4 into seq;
insert 5 into seq;
delete 1 from seq;

println(seq);
[/jfx]

If you run this code, the output will be [ 2, 3, 4, 5 ], as expected. It’s understandable if you think the sequence is being updated in-place – it certainly looks that way from where we’re all standing – except that under the hood it isn’t, and there are performance implications we need to be clear on. To get straight to the point, the insert and delete instructions create a brand new sequence instance, but fortunately they also ensure that the seq variable always refers to the new instance, quietly leaving the old instance to get garbage collected.

Of course, as I just noted, from a developers perspective we can treat seq as a reference to a sequence instance and be oblivious to these ‘behind-the-scenes’ sequence shenanigans. There is nothing wrong with this, but it’s important to know that creating sequences isn’t free, and certainly minimising the amount of sequences we create is a smart idea.

To be slightly more technical, I should note that there are some exceptions to this ‘sequences are immutable’ rule when dealing with non-shared sequences of specific types, but I don’t want to muddy the point of this blog, which is….

Use the expression language features of JavaFX to reduce the number of sequence instances you need to create.

For example, you might have code that looks like the following:

[jfx]
var seq:Integer[];

for (i in [1..MAX]) {
insert i*i into seq;
}
[/jfx]

I hope, with this blog post firmly at the forefront of your mind, you can now see why perhaps this isn’t such a great idea: we’re creating MAX sequence instances, one for each iteration of the for loop – ouch. Importantly, this example isn’t far-fetched and made to prove a point: I just found a very similar example in my code the other day (which I promptly fixed up), and even found a few blog posts on this site and other sites that use this approach. What you actually should do is something like the following:

[jfx]
var seq:Integer[];

seq = for (i in [1..MAX]) {
i*i
}
[/jfx]

In this case rather than create MAX instances of the sequence, we’re creating only one – the result of the for expression. This is considerably more efficient, and can lead to huge performance gains, depending on what you’re doing. In my case it made the difference between my code being usable and being jittery. Obviously in my case this was a very critical for loop, and it may be similar in your use cases also.

I hope that this helps you eek out a little more performance from your apps. Now, go forth and optimise your sequences

Custom Cell Caching

Jonathan Giles — Sat, 26 Jun 2010 06:09:23 +0000

Ok, I know we’ve been going on about custom cells / cell factories a bit recently, but I wanted to do one more post about a very useful topic: caching within cell content.

These days ‘Hello World’ has been replaced by building a Twitter client, so I’ve decided to frame this topic in terms of building a Twitter client. Because I don’t actually care about the whole web service side of thing, I’ve neglected to implement the whole ‘real data’ / web services aspect of it. If you want to see an actual running implementation with real data, have a look at William Antônio’s Twitter client, which is using this ListCell implementation.

So, in all the posts to this site related to cells, I’m sure you’ve probably come to appreciate the ways in which you should create a ListView or TreeView with custom cell factories. Therefore, what I really want to cover in this post is just the custom cell implementation, and the importance of caching. A Twitter client wouldn’t be a true client without showing the users profile image, so this is my target for caching. Without caching, each time the cell was updated (i.e. the content changes due to scrolling, or when we scroll a user out of screen and then back in), we’d have to redownload and load the image. This would lead to considerable lag and a poor user experience. What we need to do is load the image once, cache it, and reuse it whenever the image URL is requested by a cell. At the same time, we don’t want to run the PC dry of memory by loading all profile images into memory. Enter: SoftReference caching.

Word of warning: I’m not a caching expert. It is possible that I’ve done something stupid, and I hope you’ll let me know, but I believe that the code below should at least be decent. I’ll happily update this example if anyone gives me useful feedback.

Check out the code below, and I’ll continue to discuss it afterwards.

[jfx]
import model.Tweet;

import java.lang.ref.SoftReference;
import java.util.HashMap;

import javafx.geometry.HPos;
import javafx.geometry.VPos;
import javafx.util.Math;
import javafx.scene.control.Label;
import javafx.scene.control.ListCell;
import javafx.scene.image.Image;
import javafx.scene.image.ImageView;
import javafx.scene.layout.Container;
import javafx.scene.text.Font;
import javafx.scene.text.FontWeight;

// controls whether the cache is used or not. This _really_ shouldn’t be false!
def useCache = true;

// map of String -> SoftReference (of Image)
def map = new HashMap();

def IMAGE_SIZE = 48;

public class TwitterListCell extends ListCell {

// used to represent the users image
var imageView:ImageView;

// a slightly bigger and bolder label for the persons name
var personName:Label = Label {
font: Font.font("Arial", FontWeight.BOLD, 13);
}

// the message label
var message:Label = Label {
textWrap: true
}

override var node = Container {
content: bind [ imageView, personName, message ]

override function getPrefHeight(width:Number):Number {
def w = listView.width;
Math.max(IMAGE_SIZE, personName.getPrefHeight(w) + message.getPrefHeight(w));
}

override function doLayout():Void {
var x:Number = -1.5;
var y:Number = 0;
var listWidth = listView.width;
var cellHeight = height;

// position image
Container.positionNode(imageView, x, y, IMAGE_SIZE, cellHeight,
HPos.CENTER, VPos.TOP, false);

// position text at the same indent position regardless of whether
// an image exists or not
x += IMAGE_SIZE + 5;
var textWidth = listWidth – x;
var personNameHeight = personName.getPrefHeight(textWidth);
Container.resizeNode(personName, textWidth, personNameHeight);
Container.positionNode(personName, x, y, listWidth – x, personNameHeight,
HPos.LEFT, VPos.TOP, false);

y += personNameHeight;
Container.resizeNode(message, textWidth, message.getPrefHeight(textWidth));
Container.positionNode(message, x, y, listWidth – x, height – personNameHeight,
HPos.LEFT, VPos.TOP, false);
}
}

override var onUpdate = function():Void {
var tweet = item as Tweet;

personName.text = tweet.person.name;
message.text = tweet.message;

// image handling
if (map.containsKey(tweet.person.image)) {
// the image has possibly been cached, so lets try to get it
var softRef = map.get(tweet.person.image) as SoftReference;

// get the image out of the SoftReference wrapper
var image = softRef.get() as Image;

// check if it is null – which would be the case if the image had
// been removed by the garbage collector
if (image == null) {
// we need to reload the image
loadImage(tweet.person.image);
} else {
// the image is available, so we can reuse it without the
// burden of having to download and reload it into memory.
imageView = ImageView {
image: image;
}
}
} else {
// the image is not cached, so lets load it
loadImage(tweet.person.image);
}
};

function loadImage(url:String) {
// create the image and imageview
var image = Image {
url: url
height: IMAGE_SIZE
preserveRatio: true
backgroundLoading: true
}
imageView = ImageView {
image: image;
}

if (useCache) {
// put into cache using a SoftReference
var softRef = new SoftReference(image);
map.put(url, softRef);
} else {
map.remove(url);
}
}
}
[/jfx]

You’ll note that in this example most of the code is pretty standard. A few variables are created for the image and text, and then I’ve gone the route of laying the content out in a Container, but you can achieve a similar layout using the available layout containers. Following this I have defined an onUpdate function, which is called whenever the cell should be updated. This is usually called due to a user interaction, which may potentially change the Cell.item value, which would of course require an update of the cell’s visuals.

The bulk, and most important part, of the onUpdate function deals with loading the users profile image, or retrieving and reusing the cached version of it. Note the use of the global HashMap, which maps between the URL of the users image and the Image itself. Because it is global (i.e. static), this map will be available, and used, by all TwitterListCell instances. Also important to note is that I didn’t put the ImageView itself into the HashMap as a Node can not be placed in multiple positions in the scenegraph, but an Image can be.

The rest of the code in this class really just deals with the fact that a SoftReference may clear out it’s reference to the Image object if the garbage collector needs the memory, in which case we need to reload the image again. The other obvious part is the need to also put the image into the cache if it’s not already there.

Shown below is the end result, but remember that there is a working version of this demo in William Antônio’s Twitter client, which is a very early work in progress.

I hope this might be useful to people, and as always we’re keen to hear your thoughts and feedback, and what you’re hoping us to cover. Until next time – cheers!

UI Virtualization

Richard Bair — Fri, 25 Sep 2009 16:10:12 +0000

When you have a lot of data to display in a Control such as a ListView, you need some way of virtualizing the Nodes created and used. For example, if you have 10 million data items, you don’t want to create 10 million Nodes. So you create enough Nodes to fill the display dynamically. Because of our heritage in Swing, we know how critical this is for real apps. I got an optimization issue reported this morning on “UI Virtualization”. The report included the following link to a Silverlight blog describing what they’re doing in Silverlight 3:
http://bea.stollnitz.com/blog/?p=338

Having spent several weeks on this problem during the Spring and Summer of this year, the blog on how Silverlight is addressing the issue I found very interesting. It’s fun for me as a developer to see my counterparts in other toolkits wrestling with the same issues and coming to many of the same conclusions. In this case I think we have a stronger solution. I’m going to do a little compare and contrast not in the spirit of competition but rather just to describe what we’re doing.

In the 1.2 release of JavaFX we released the ListView (our equivalent of ListBox in WPF/Silverlight or JList in Swing). The ListView wasn’t very customizable. Each list item would only display the “toString” of the associated data item. However, the guts of the ListView were very complex to support UI Virtualization. Following JavaOne I completely rewrote the Control. The current implementation (still being finalized by some great engineers on the UI Controls team) now exposes API for allowing you to completely customize each cell while also being very efficient in terms of handling very large lists.

During the refactoring I created a (currently private) class called VirtualizedFlow which handles most of the dirty work. I found reading the Silverlight blog linked to above that WPF has a similar class, called VirtualizingStackPanel. It’s always fun to see how other people address the same problem, and a lot of fun when you find out you came to the same conclusions.

WPF has supported UI virtualization for a long time. The ListBox and ListView controls use VirtualizingStackPanel as their default panel, and VSP knows to create UI containers (ListBoxItems or ListViewItems) when new items are about to be shown in the UI, and to discard those containers when items are scrolled out of view.

This is a good start, but actually not good enough. Another feature I built into ListView from the start (1.2 release) was the notion of recycling the cells used. In the Silverlight blog, it is mentioned that .NET 3.5 SP1 also supports this:

.NET 3.5 SP1 supports the reuse of UI containers already in memory. For example, imagine that when a ListBox is loaded, 30 ListBoxItems are created to display the visible data. When the user scrolls the ListBox, instead of discarding ListBoxItems that scroll out of view and creating new ones for the data items that scroll into view, WPF reuses the existing ListBoxItems. This results in significant performance improvements compared to previous versions because it decreases the time spent initializing ListBoxItems. And since garbage collection is not instantaneous, it also reduces the number of ListBoxItems in memory at one time.

However, unlike .NET which requires you to take a separate step to enable recycling, we do it always and for free. The JavaFX API is designed such that it is a natural part of how the system works. When designing the JavaFX API, I wanted to avoid some of the odd corner cases that came out of Swing’s cell renderer concept, but keep the insane scalability. I think we’ve managed that. It is a little more complicated than the simple naive approach of creating a new Cell for every item, but not by much.

The basic concept in JavaFX revolves around the concept of a Cell (which is a Node). Cells are used to display data items in a ListView (and in the future, a TreeView and TableView and TreeTableView, etc). Each Cell represents a data item, though which data item is represented may change over time. For example, suppose I had the following ListView:

ListView {
    items: [1..1000]
}

This ListView will display the numbers 1 through 1000. Initially, enough Cells will be created to display each visible row. So you may have a Cell which will represent data item “1”, and another which represents data item “2”, and so forth maybe up to “10”. If the user then scrolls, then we will attempt to reuse Cells. So the Cell which used to represent “1” now represents “33”.

When you look at the Cell API this is apparent:

public class Cell extends CustomNode, Resizable {
    /**
     * The data value associated with this Cell. This value is set by the
     * multiviewed Control when the Cell is created or updated. This represents
     * the raw data value.
     */
    public var item:Object;

    /**
     * Indicates whether or not this cell has been selected.
     */
    public var selected:Boolean;
    
    /**
     * Called by the system whenever the state of the Cell has been updated.
     */
    public var onUpdate:function():Void;

    /**
     * A container for the nodes that represent the content of this cell. Since
     * Cells must be resized to fit within some space, and since their preferred
     * sizes are important for determining the size of a cell, a Container of
     * some kind is required as the root of the content for the cell.
     */
    public var node:Node;
    
    /**
     * The container for the nodes used to represent the background of the cell.
     * If this is left empty, then the Skin for the Control is responsible for
     * creating a default background, if it so chooses. In this way, custom
     * cells can be created that define both the content and background, or that
     * define only the content with the background being provided by the skin.
     */
    public var background:Node;
}

As you can see in this API, each Cell has an item and the item can change over time. The Cell also has a node which represents the cell’s body, as well as a background which is typically used for rendering selection and so forth and if left null will be supplied by the ListView skin implementation. So to create a custom Cell, all you would need to do is provide the node, and wire it up (bind it) to the item. The system then will reuse Cells, simply updating the item over time as the Cell represents something different, and the bindings will then update the node. There is also a procedural approach where we invoke an onUpdate function variable, which allows the Cell author to respond procedurally if it makes more sense (for example, when working with FXD content exported from the JavaFX Production Suite this might make sense).

The last part of the puzzle is some way for the system to create your custom Cell whenever it needs a new one (which it pools up itself). This is actually very trivially done by exposing a function variable callback on ListView which allows you to create and return a new Cell. In this way you can create custom Cells for representing data richly in your JavaFX apps.

ListView {
    items: [1..1000]
    cellFactory: function():ListCell {
        ListCell {
            node: Panel { ... }
        }
    }
}

Since cell factories are simple functions, it is also easy to create shared libraries of static functions which return your own implementations and reuse them across your application. You can also subclass ListCell with your own custom implementations so as to reuse them across your application.

There is one last thing that JavaFX does that I think is quite unique (from the blog it doesn’t look like Silverlight supports it, and I don’t think Flex does, however Android might I’m not sure). First I’ll quote from the Silverlight blog as to the problem:

Currently WPF supports UI virtualization only when scrolling by item. Pixel-based scrolling is also called “physical scrolling” and item-based scrolling is also called “logical scrolling”.
[…]
I’m often asked if there is a way to work around this limitation. Well, anything is possible, but there is no *easy* workaround. You would have to re-implement significant portions of the current virtualization logic to combine pixel-based scrolling with UI virtualization. You would also have to solve some interesting problems that come with it. For example, how do you calculate the size of the thumb when the item containers have different heights? (Remember that you don’t know the height of the virtualized containers – you only know the height of the containers that are currently displayed.) You could assume an average based on the heights you do know, or you could keep a list with the item heights as items are brought into memory (which would increase accuracy of the thumb size as the user interacts with the control). You could also decide that pixel-based scrolling only works with items that are of the same height – this would simplify the solution. So, yes, you could come up with a solution to work around this limitation, but it’s not trivial.

We have the exact same problem space — we support non-homogenous row heights, potentially incredibly large numbers of items, and scrolling. This presents some really interesting problems. We have a solution to this problem which for typical ListView’s combines pixel scrolling with position (or item) scrolling in a way that the user cannot detect that we’re doing anything special behind the scenes. You can “pan” the list by using the mouse and it looks like pixel scrolling, or move by items using the keyboard and it looks like item scrolling or drag the scrollbar and you won’t know if you are pixel scrolling or item scrolling (except in some very, very edge cases) or you can click the “up” and “down” arrows on the scrollbar and get pixel scrolling. We’re still working on the thumb size problem which I think will turn out to be some heuristic that adjusts the thumb size as it finds out more information on the size of the list (typically for large lists you can assume it needs to be the min size, while on small lists you might just create enough cells to get a reasonably accurate thumb. It only goes awry if you have whacked out differences in cell heights).

So since I want to close out the optimization issue report, I decided to write this blog detailing what we’re planning on doing. Each of our multi-valued controls (list, tree, table, etc) will use the same base Cell API (unlike TreeCellRenderer, TableCellRenderer, ListCellRenderer which were all different in some annoying ways). They’ll all support UI virtualization. Initially we won’t support “model virtualization” (ala TableModel, ListModel, TreeModel) though that will be coming in a future release.

Performance: improving insertion times

Richard Bair — Tue, 15 Sep 2009 23:14:55 +0000

We’re in the middle of a huge performance push on JavaFX attacking the problem from many different angles. The compiler guys (aka: Really Smart Dudes) are fixing a lot of the long standing problems with binding. The graphics guys (aka: Really Hip Smart Dudes) are engaged in writing our lightweight hardware accelerated backend story. I’m working with Kevin Rushforth and Chien Yang on attacking the performance issues in the JavaFX scenegraph (which in large measure are impacted by the compiler work, so we’re working closely with them).

Today I was dredging up an old issue we’d looked at earlier this year regarding insertion times, and specifically, some really radically bad insertion times into Groups. It turns out to be timely as Simon Brocklehurst was encountering this very issue. This post will go into a bit more depth as to what is currently going on, what we’re doing about it, and some other cool / interesting tidbits.

So first, some code:

java.lang.System.out.print("Creating 100,000 nodes...");
var startTime = java.lang.System.currentTimeMillis();
var nodes = for (i in [0..<100000]) Rectangle { };
var endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

java.lang.System.out.print("Adding 100,000 nodes to sequence one at a time...");
startTime = java.lang.System.currentTimeMillis();
var seq:Node[];
for (n in nodes) {
    insert n into seq;
}
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

java.lang.System.out.print("Adding 100,000 nodes to group one at a time...");
startTime = java.lang.System.currentTimeMillis();
var group = Group { }
for (n in nodes) {
    insert n into group.content;
}
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

group.content = [];

java.lang.System.out.print("Adding 100,000 nodes to group all at once...");
startTime = java.lang.System.currentTimeMillis();
var group2 = Group { }
group2.content = nodes;
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

In this test we go ahead and create 100,000 nodes (if you run this at home be sure to bump up memory to accommodate — the compiler work going on will make it so this fits in memory but for now we have to increase the heap). We then have 3 tests. The first one adds the nodes, one at a time, to a plain sequence. The second adds the nodes, one at a time, to a Group (and to try to keep things fair, the Group isn’t in a scene). And the third test adds all the nodes to the Group in one go.

Here are the numbers I recorded:

Creating 100,000 nodes...took 13321ms
Adding 100,000 nodes to sequence one at a time...took 39ms
Adding 100,000 nodes to group one at a time...took 1203783ms
Adding 100,000 nodes to group all at once...took 213ms

Ouch! 1,203 seconds (or about 20 minutes) to insert nodes one at a time into the group’s content, whereas it took only 39ms to fill up a plain old sequence. The second Ouch! is that it took 13s to create this many nodes. By comparison, creating the node “peers” (which is an implementation detail, but basically the rendering pipeline representation of the node) only took a half second.

So first, there is clearly some work to do on startup and I’m confident we’ll get that sorted, it isn’t rocket science. Just gotta reduce redundant work. Check.

So how about that second part? Well, for reference, I wrote the same test in pure Java talking to the swing-based node peers directly. The numbers for that:

Creating 100,000 nodes...took 495ms
Adding 100,000 nodes to sequence one at a time...took 10ms
Adding 100,000 nodes to group one at a time...took 47ms
Adding 100,000 nodes to group all at once...took 122ms

Ya, so obviously there is a big difference between 47ms and 20 minutes. 47ms represents something we know we can get close too — after all, we have already done so in the swing-based peer. There are, however, two big things that are different between the FX Group code and the peer Group code. The FX Group has checks for circularity and also for duplication whereas the peer does not (since it knows the FX side has already handled the problem).

Commenting out the circularity check and the duplication check gets us from 20 minutes down to about 21 seconds. Still several orders of magnitude too long, but a heck of a lot better. There are various other things going on in the FX Group code that we could single out too, and in the end get really close to 40ms.

So, what does this mean? One option is to throw all semblance of safety out the window, giving developers / designers all kinds of rope and letting them hang themselves. Which probably isn’t a good way to treat your users. Another option is to optimize the checks as best we can. While that is probably going to give some win, it won’t give the big win.

Probably the best answer (and I have yet to prove it) is to simply defer the work, sort of batching it up behind the scenes. Basically, suppose you insert 100 nodes into a group, one at a time, but never ask for the group content. What if we were to defer circularity checks and so forth (actually, defer nearly all the work) until the group’s content was read. This would allow us to still have the nice checks, but would defer error reporting (potentially bothersome) until the value is read, but would give us the performance of a batched up insert. And that would be closer to 213ms than 20 minutes.

That’s the idea anyway, I’ll see if I can make it work. Even so I bet we could take that 213 and cut it in half (remember 40ms of it is being eaten by the backend peer, so if we got it down to 80ms we’d be smokin’).

Update

I’ve been doing more work on this and have a prototype that indeed gets the insert times for 100,000 nodes down from 23 minutes to 200ms. I do it without batching up the changes, but by simply making the duplicate checks more efficient. There is also a bit of trickery related to convincing the compiler not to create a duplicate of oldNodes sequence — a topic for another day.