FX Experience Has Gone Read-Only

I've been maintaining FX Experience for a really long time now, and I love hearing from people who enjoy my weekly links roundup. One thing I've noticed recently is that maintaining two sites (FX Experience and JonathanGiles.net) takes more time than ideal, and splits the audience up. Therefore, FX Experience will become read-only for new blog posts, but weekly posts will continue to be published on JonathanGiles.net. If you follow @FXExperience on Twitter, I suggest you also follow @JonathanGiles. This is not the end - just a consolidation of my online presence to make my life a little easier!

tl;dr: Follow me on Twitter and check for the latest news on JonathanGiles.net.

We’re in the middle of a huge performance push on JavaFX attacking the problem from many different angles. The compiler guys (aka: Really Smart Dudes) are fixing a lot of the long standing problems with binding. The graphics guys (aka: Really Hip Smart Dudes) are engaged in writing our lightweight hardware accelerated backend story. I’m working with Kevin Rushforth and Chien Yang on attacking the performance issues in the JavaFX scenegraph (which in large measure are impacted by the compiler work, so we’re working closely with them).

Today I was dredging up an old issue we’d looked at earlier this year regarding insertion times, and specifically, some really radically bad insertion times into Groups. It turns out to be timely as Simon Brocklehurst was encountering this very issue. This post will go into a bit more depth as to what is currently going on, what we’re doing about it, and some other cool / interesting tidbits.

So first, some code:

java.lang.System.out.print("Creating 100,000 nodes...");
var startTime = java.lang.System.currentTimeMillis();
var nodes = for (i in [0..<100000]) Rectangle { };
var endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

java.lang.System.out.print("Adding 100,000 nodes to sequence one at a time...");
startTime = java.lang.System.currentTimeMillis();
var seq:Node[];
for (n in nodes) {
    insert n into seq;
}
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

java.lang.System.out.print("Adding 100,000 nodes to group one at a time...");
startTime = java.lang.System.currentTimeMillis();
var group = Group { }
for (n in nodes) {
    insert n into group.content;
}
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

group.content = [];

java.lang.System.out.print("Adding 100,000 nodes to group all at once...");
startTime = java.lang.System.currentTimeMillis();
var group2 = Group { }
group2.content = nodes;
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

In this test we go ahead and create 100,000 nodes (if you run this at home be sure to bump up memory to accommodate — the compiler work going on will make it so this fits in memory but for now we have to increase the heap). We then have 3 tests. The first one adds the nodes, one at a time, to a plain sequence. The second adds the nodes, one at a time, to a Group (and to try to keep things fair, the Group isn’t in a scene). And the third test adds all the nodes to the Group in one go.

Here are the numbers I recorded:

Creating 100,000 nodes...took 13321ms
Adding 100,000 nodes to sequence one at a time...took 39ms
Adding 100,000 nodes to group one at a time...took 1203783ms
Adding 100,000 nodes to group all at once...took 213ms

Ouch! 1,203 seconds (or about 20 minutes) to insert nodes one at a time into the group’s content, whereas it took only 39ms to fill up a plain old sequence. The second Ouch! is that it took 13s to create this many nodes. By comparison, creating the node “peers” (which is an implementation detail, but basically the rendering pipeline representation of the node) only took a half second.

So first, there is clearly some work to do on startup and I’m confident we’ll get that sorted, it isn’t rocket science. Just gotta reduce redundant work. Check.

So how about that second part? Well, for reference, I wrote the same test in pure Java talking to the swing-based node peers directly. The numbers for that:

Creating 100,000 nodes...took 495ms
Adding 100,000 nodes to sequence one at a time...took 10ms
Adding 100,000 nodes to group one at a time...took 47ms
Adding 100,000 nodes to group all at once...took 122ms

Ya, so obviously there is a big difference between 47ms and 20 minutes. 47ms represents something we know we can get close too — after all, we have already done so in the swing-based peer. There are, however, two big things that are different between the FX Group code and the peer Group code. The FX Group has checks for circularity and also for duplication whereas the peer does not (since it knows the FX side has already handled the problem).

Commenting out the circularity check and the duplication check gets us from 20 minutes down to about 21 seconds. Still several orders of magnitude too long, but a heck of a lot better. There are various other things going on in the FX Group code that we could single out too, and in the end get really close to 40ms.

So, what does this mean? One option is to throw all semblance of safety out the window, giving developers / designers all kinds of rope and letting them hang themselves. Which probably isn’t a good way to treat your users. Another option is to optimize the checks as best we can. While that is probably going to give some win, it won’t give the big win.

Probably the best answer (and I have yet to prove it) is to simply defer the work, sort of batching it up behind the scenes. Basically, suppose you insert 100 nodes into a group, one at a time, but never ask for the group content. What if we were to defer circularity checks and so forth (actually, defer nearly all the work) until the group’s content was read. This would allow us to still have the nice checks, but would defer error reporting (potentially bothersome) until the value is read, but would give us the performance of a batched up insert. And that would be closer to 213ms than 20 minutes.

That’s the idea anyway, I’ll see if I can make it work. Even so I bet we could take that 213 and cut it in half (remember 40ms of it is being eaten by the backend peer, so if we got it down to 80ms we’d be smokin’).

Update

I’ve been doing more work on this and have a prototype that indeed gets the insert times for 100,000 nodes down from 23 minutes to 200ms. I do it without batching up the changes, but by simply making the duplicate checks more efficient. There is also a bit of trickery related to convincing the compiler not to create a duplicate of oldNodes sequence — a topic for another day.