Performance: improving insertion times

by Richard Bair | Sep 15, 2009 | Performance | 10 comments

We’re in the middle of a huge performance push on JavaFX attacking the problem from many different angles. The compiler guys (aka: Really Smart Dudes) are fixing a lot of the long standing problems with binding. The graphics guys (aka: Really Hip Smart Dudes) are engaged in writing our lightweight hardware accelerated backend story. I’m working with Kevin Rushforth and Chien Yang on attacking the performance issues in the JavaFX scenegraph (which in large measure are impacted by the compiler work, so we’re working closely with them).

Today I was dredging up an old issue we’d looked at earlier this year regarding insertion times, and specifically, some really radically bad insertion times into Groups. It turns out to be timely as Simon Brocklehurst was encountering this very issue. This post will go into a bit more depth as to what is currently going on, what we’re doing about it, and some other cool / interesting tidbits.

So first, some code:

java.lang.System.out.print("Creating 100,000 nodes...");
var startTime = java.lang.System.currentTimeMillis();
var nodes = for (i in [0..<100000]) Rectangle { };
var endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

java.lang.System.out.print("Adding 100,000 nodes to sequence one at a time...");
startTime = java.lang.System.currentTimeMillis();
var seq:Node[];
for (n in nodes) {
    insert n into seq;
}
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

java.lang.System.out.print("Adding 100,000 nodes to group one at a time...");
startTime = java.lang.System.currentTimeMillis();
var group = Group { }
for (n in nodes) {
    insert n into group.content;
}
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

group.content = [];

java.lang.System.out.print("Adding 100,000 nodes to group all at once...");
startTime = java.lang.System.currentTimeMillis();
var group2 = Group { }
group2.content = nodes;
endTime = java.lang.System.currentTimeMillis();
println("took {endTime - startTime}ms");

In this test we go ahead and create 100,000 nodes (if you run this at home be sure to bump up memory to accommodate — the compiler work going on will make it so this fits in memory but for now we have to increase the heap). We then have 3 tests. The first one adds the nodes, one at a time, to a plain sequence. The second adds the nodes, one at a time, to a Group (and to try to keep things fair, the Group isn’t in a scene). And the third test adds all the nodes to the Group in one go.

Here are the numbers I recorded:

Creating 100,000 nodes...took 13321ms
Adding 100,000 nodes to sequence one at a time...took 39ms
Adding 100,000 nodes to group one at a time...took 1203783ms
Adding 100,000 nodes to group all at once...took 213ms

Ouch! 1,203 seconds (or about 20 minutes) to insert nodes one at a time into the group’s content, whereas it took only 39ms to fill up a plain old sequence. The second Ouch! is that it took 13s to create this many nodes. By comparison, creating the node “peers” (which is an implementation detail, but basically the rendering pipeline representation of the node) only took a half second.

So first, there is clearly some work to do on startup and I’m confident we’ll get that sorted, it isn’t rocket science. Just gotta reduce redundant work. Check.

So how about that second part? Well, for reference, I wrote the same test in pure Java talking to the swing-based node peers directly. The numbers for that:

Creating 100,000 nodes...took 495ms
Adding 100,000 nodes to sequence one at a time...took 10ms
Adding 100,000 nodes to group one at a time...took 47ms
Adding 100,000 nodes to group all at once...took 122ms

Ya, so obviously there is a big difference between 47ms and 20 minutes. 47ms represents something we know we can get close too — after all, we have already done so in the swing-based peer. There are, however, two big things that are different between the FX Group code and the peer Group code. The FX Group has checks for circularity and also for duplication whereas the peer does not (since it knows the FX side has already handled the problem).

Commenting out the circularity check and the duplication check gets us from 20 minutes down to about 21 seconds. Still several orders of magnitude too long, but a heck of a lot better. There are various other things going on in the FX Group code that we could single out too, and in the end get really close to 40ms.

So, what does this mean? One option is to throw all semblance of safety out the window, giving developers / designers all kinds of rope and letting them hang themselves. Which probably isn’t a good way to treat your users. Another option is to optimize the checks as best we can. While that is probably going to give some win, it won’t give the big win.

Probably the best answer (and I have yet to prove it) is to simply defer the work, sort of batching it up behind the scenes. Basically, suppose you insert 100 nodes into a group, one at a time, but never ask for the group content. What if we were to defer circularity checks and so forth (actually, defer nearly all the work) until the group’s content was read. This would allow us to still have the nice checks, but would defer error reporting (potentially bothersome) until the value is read, but would give us the performance of a batched up insert. And that would be closer to 213ms than 20 minutes.

That’s the idea anyway, I’ll see if I can make it work. Even so I bet we could take that 213 and cut it in half (remember 40ms of it is being eaten by the backend peer, so if we got it down to 80ms we’d be smokin’).

Update

I’ve been doing more work on this and have a prototype that indeed gets the insert times for 100,000 nodes down from 23 minutes to 200ms. I do it without batching up the changes, but by simply making the duplicate checks more efficient. There is also a bit of trickery related to convincing the compiler not to create a duplicate of oldNodes sequence — a topic for another day.

10 Comments

Osvaldo Pinali Doederlein on September 15, 2009 at 7:28 pm

Great stuff. But if I could vote, I’d raise both hands for “giving developers / designers all kinds of rope” – I can use this rope to hang myself, but I can also use a strong, simple, cheap piece of rope to do lift pretty heavy stuff, without paying the cost of a high-tech hydraulic elevator. And perhaps that’s just my lack of experience programming scene graphs (JavaFX is my first), but circularity seems the kind of Awesome Stupidity bug that I’d never expect the average developer to do with any significant frequency. (As for designers, they are supposed to use visual tools like CS4 or the upcoming Designer Tool, so many structural errors can be made impossible, or validated, by the tool.)

Idea: have such high-level validations conditionally compiled (or enabled through the assertion mechanism), so they are only active in the SDK for debugging/testing purpose (and even there with a option to disable), but absent in the end-user runtimes. Cutting this stuff out of the runtime would also remove some download and loading-time overhead. Right now JavaFX has worse deployment story than competing RIA toolkits, it’s perhaps the Achilles heel of all Java-based desktop technologies, so the last thing we need is extra bloat that serves only to prevent people from shooting their own feet.
Reply
- Richard on September 15, 2009 at 9:50 pm
  Hey Osvaldo, thanks for the comments (and voting is always allowed and appreciated!). There isn’t any gain by cutting out the circularity check in terms of download or loading time — its one function:
  
  package function wouldCreateCycle(parent:Node, child:Node): Boolean { var n: Node = parent; while (n != child) { if (n.parent != null) { n = n.parent; } else if (n.clipParent != null) { n = n.clipParent; } else { return false; } } true }
  Reply
Tbee on September 16, 2009 at 1:06 am

Another option, since you are working closely with the compiler guys, is splitting up the insert and the check in special methods.

So there is the “official insert” that does “insert” + “check”.
And there is the “dangerous insert” that only does “insert”.

Since insert is a special keyword, the compiler guys could route that keyword to “dangerous insert” and call “check” just before the next time the group is used inside the block (directly or as a parameter) or when exiting the block. This way you keep the check close to where the magic is happening instead of getting an error completely else.

An challenge is how to deal with non local groups (instance variables).

We could also do both approaches: insert does not check, check at the block boundary or first usage AND check when reading. A boolean “dirty” could be used to mark if checking is required.
Reply
Pedro Duque Vieira on September 16, 2009 at 5:22 am

Sadly, I’m in the middle of giving up javafx, after some months of commitment and working hours.
There are still no news of an official way to embed javafx on java which is a main killer point for me and I expect a lot other developers. But on the contrary there are some guesses that the next release will make it even more difficult/impossible for that to happen.
All those improvements are good but they are not targeting the main issues.
Sorry for being so negative, just my view on things.
Reply
- Richard on September 16, 2009 at 8:43 am
  
  The main reason we’ve not released a supported method for embedding fx into an existing swing app is that there is not yet a nice way to call from java directly to javafx. Emphasis on the word “yet”.
  
  Cheers!
  Reply
Pedro Duque Vieira on September 17, 2009 at 5:11 am

That’s nice Richard!

I’m really looking forward for that to become available.

I think javafx is awesome, it would be a tremendous waste not to make it available to existing swing developers who want to spice up their apps. 🙂

Thanks.
Reply
Hans on September 18, 2009 at 5:16 am

One remark on the circularity check: if child has no children there can be no circularity, can there? So you could use that to speed it up?

Cheers, Hans
Reply
- Richard on September 18, 2009 at 9:26 am
  
  Yes, nearly. A childless/leaf node might also be used as a clip on some node, so you have to make sure that it also doesn’t have a clip parent. For example, I could do:
  
  var r:Rectangle = Rectangle { clip: r }
  Reply
Noel Grandin on September 22, 2009 at 8:39 am

Surely you can supply an API for batch inserts – then the checks can be done once per batch instead of for every object.

That does push the responsibility to the client programmer, but sometimes that is the easiest answer 🙂
Reply
Pedro Duque Vieira on September 22, 2009 at 10:48 am

Richard,

Can you tell us when will the embedding of javafx in java be available? Will it be on the next release?

Why doesn’t the JavaFX team simply make JXScene (blogged on Josh Marinnaci blog) an official API?

Having some decisions to make and some openness on JavaFX would be greatly appreciated.

Thanks a lot!
Reply

FX Experience Has Gone Read-Only

Performance: improving insertion times

10 Comments

Submit a Comment Cancel reply

Sponsored By

Published By

Archives