SF/JavaOne, Day 3, Java 1.6 Collections


I got to go see the BirdsOfAFeather talk with Josh Bloch concerning the new collection in Java1.5 (which you can read about: here) and the upcoming collections they have planned for the future (which you can read about: here). 
I’ve long been a fan of the java collections and i’ve found that
they’ve normally held a good balance between simplicity and
power.  It’s actually a case of “Goldilocks and the Three Bears”
for me.  stl is too complex and painful to use, .Net is too
simplistic and limited in power, whereas java get’s it just about
right.  It’s not perfect, but it’s flexible enough to handle
almost anything you’d ever need. I always found the deep use of
abstractions to be enormously helpful for writing your own special
collections while only writing 1/10th or evern 1/100th of the code
necessary to implement the full interface.  It’s also not
cluttered with a gazillion interfaces like i’ve seen in other packages
which isn’t especially helpful in a language like java which doesn’t
have unions.



However, it’s starting to seem like the new collections are starting to
buckle under their own weight in recent java releases.  With the java.util.concurrent
package it seems like the number of collections grew enormously with
new axes of concurrent access and blocking access being threaded
through (no pun intended) every type of collection and interface. 
In my experience, having a collection surrounded by a reader/writer lock
usually fit the bill just fine.  Of course, i only ever had to
handle a couple of hundred threads at a time, and it’s quite possible
that to be able to scale to thousands better and deeper integration of
lock-free data structures are really needed.  I also question the
need for skip lists to be introduced into the core collection
implementation methods.  I think it’s important to provide core
implementations that offer linear, logarythmic and
amortized-constant/constant time performance with the associated
benefits and drawbacks.  But i don’t see why skip-lists
are needed in the core collections, but i suppose the fact that Bill
Pugh is on the JSR166 expert group might have something to do with it.



However, i shouldnt’ knock it until i’ve tried it.  I’ll have to
dig up my massively overthreaded web app that i built at one point and
see if these new collections can make that code much simpler, safer,
easier to understand, and possibly faster.  If it achieves any of
those benfits without any significant drawbacks, then i guess it will
have been worth it.


Comments (11)

  1. DrPizza says:

    1) STL isn’t too complex. It’s barely complex enough. .NET is far too simplistic, I think we can all agree there, but even Java has significant gaps.

    2) The advantage of lockless and waitless data structures is not so much better scaling (oftentimes they don’t actually scale better than lightweight sychronization primitives such as spinlocks or (in most circumstances) things like Win32’s Critical Sections). Rather, it’s reduced complexity. With a lockless linked list you don’t have to worry about synchronization or priority inversion or deadlocks or anything like that. They always work, all of the time.

    This has particular benefits in some situations where Java isn’t usable; within the NT kernel, for example, the use of lockless lists means that even routines that can’t readily yield (because they need to run at high IRQL) can share data with lower priority routines without risk of deadlock.

  2. barrkel says:

    Skip lists are logarithmic too, but have better average complexity & cache locality than red-black trees, the traditional logarithmic collection base. That’s what I’ve been able to gather.

    .NET could definitely be improved with better collections.

  3. BradA says:

    Have you checked PowerCollections [1]? How does it fit in the power-simplicity balance?

    http://wintellect.com/powercollections/

  4. CyrusN says:

    Brad: I have lots of problems with it. Primarily, it seems to be lacking a lot of consistancy in it’s design, and i find consistancy to be highly valuable in an api, (as long as is doesn’t have a major negative effect).

  5. CyrusN says:

    barkkel: "Skip lists are logarithmic too, but have better average complexity & cache locality than red-black trees, the traditional logarithmic collection base"

    True. But then, i would have just preferred that the internals of TreeSet and TreeMap get replaces with SkipLists. I don’t see the need to actually introduce a whole new collection for this. Were there that many people saying "i need collections with the same algorithmic complexity as the current sorted collections, but i must have the constant factore reduced!"?

  6. DrPizza says:

    "True. But then, i would have just preferred that the internals of TreeSet and TreeMap get replaces with SkipLists. I don’t see the need to actually introduce a whole new collection for this. Were there that many people saying "i need collections with the same algorithmic complexity as the current sorted collections, but i must have the constant factore reduced!"? "

    Ah, but that’s not the trade-off being better. Rather, it’s closer to the "QuickSort vs. Merge Sort" trade-off. Skip lists give you logarithmic access *on average* (just as QS gives nlogn sorting *on average*) but can give you linear access in the worst case (just as QS can give you n^2 sorting in the worst case).

    One may decide that it’s worth the risk for the better constant factors–just as one may decide QS is worth the risk, again for its better constant factors–but one should not be forced to make the switch. It does indeed need a whole new collection.

  7. CyrusN says:

    DrPizza: I disagree. By this measure we should have AVL trees, Skip Lists, Red black trees, and every other collection prescribed in every data structure book out there. I think a balance needs to be struck so that you don’t have a collections API with 1000 different collections in it, all subtly different.

    One of the balances that the java API’s had *previously* was that they chose collections based on algorithmic complxity. So you could say "i want a list with fast adding and fast lookup, but poor insertion" or "i need somethign that’s sorted, but i need better than linear lookup" or "i want constant lookup in the expected case" etc. etc. etc. Now, do we need an AVLTreeSet? After all, it would give you different performance characteristics from a TreeSet and one might want to weigh one against eachother. Personally, i would say no. Especially since i’ve never seen any bugs with any significant amount of votes at java.sun.com saying "i need a set that has logarithmic access in the normal case, but with a better constant factor than TreeSet, but i’m willing to have it be linear in the worst case". So they added a collection and cluttered an API when there did not seem to be any real need to.

    On the other hand, they could have done stuff like add collections where there are currently *no* existing collections that suffice. For example, a Graph interface would be very appreciated.

  8. DrPizza says:

    "DrPizza: I disagree. By this measure we should have AVL trees, Skip Lists, Red black trees, and every other collection prescribed in every data structure book out there. I think a balance needs to be struck so that you don’t have a collections API with 1000 different collections in it, all subtly different."

    No, we don’t. AVL and Red/Black both have the same best-case, worst-case, and average complexity, and comparable locality and so on. RB trees probably have a slight advantage (fewer rotates required on insertions and deletions IIRC, and on some platforms slightly reduced storage requirements, though you’ve limited scope to take advantage of that in Java) but there’s bugger all between them.

    Skip lists and the trees differ rather more. Same average complexity, same best-case complexity, but different worst-case complexity, and significantly different locality, and easier, more performant lock-free implementations.

    "One of the balances that the java API’s had *previously* was that they chose collections based on algorithmic complxity."

    I think you’re confusing the Java collections with the C++ Standard Library.

    The C++ Standard Library is specified predominantly in terms of algorithmic complexity. A C++ std::map, for example, could be either a red/black or an AVL tree. Both meet the interface’s requirements. In practice, you only ever get red/black (better constant factors at the cost of a more complicated implementation), but according to the spec, either would do.

    The Java collections are not. TreeMap, for example, is *specced* as being a red/black tree. Neither of its interfaces (Map, SortedMap) specify its complexity. Likewise TreeSet, Set, sortedSet. TreeSet is defined as being implemented with a TreeMap, but none of the interfaces specify algorithmic complexity. It’s all just implementations.

    "So you could say "i want a list with fast adding and fast lookup, but poor insertion" or "i need somethign that’s sorted, but i need better than linear lookup" or "i want constant lookup in the expected case" etc. etc. etc. Now, do we need an AVLTreeSet? After all, it would give you different performance characteristics from a TreeSet"

    Not so as you could tell. Unlike for skiplists.

    Lock-free skiplists are better studied than lock-free r/b- and AVL trees. Lock-free skiplists are more performant than lock-free r/b- and AVL trees. They’re different enough to merit inclusion.

    "and one might want to weigh one against eachother. Personally, i would say no. Especially since i’ve never seen any bugs with any significant amount of votes at java.sun.com saying "i need a set that has logarithmic access in the normal case, but with a better constant factor than TreeSet, but i’m willing to have it be linear in the worst case". So they added a collection and cluttered an API when there did not seem to be any real need to. "

    They’ve done no such thing; these things aren’t in Java 1.5, and Java 1.6 isn’t final yet.

    "On the other hand, they could have done stuff like add collections where there are currently *no* existing collections that suffice. For example, a Graph interface would be very appreciated. "

    The difficulty there is in providing a set of interfaces that’s actually generally useful. People have certainly tried (boost, for example, has such a set of classes) but it’s not clear that they’ve been entirely successful–it’s not obvious that their generic graphs are actually useful for the graph manipulations a person might actually want to make.

    I think a principal difference is that people wanting graphs actually want to manipulate the *graphs*. The collections interfaces are completely divorced from their implementation (the concrete implementations don’t share this trait, but the implementations do). That is, a SortedSet nowhere offers any "tree" operations; there’s no rotating or splaying or indication of node colour, nor even any concession of the existance of nodes. It happens that a TreeSet is a red black tree, but nowhere do you perform tree manipulations on it.

    The same isn’t true of a graph; if you’re manipulating graphs you are, more likely than not, interested in the actual nodes and edges that make up that graph. This makes providing useful abstractions hard, and means that there’s little the class library can offer above and beyond a hand-written set of Graph interfaces.

  9. CyrusN says:

    DrPizza: Sorry for the delay! You got marked as spam (go outlook!) and i only check that folder once in a while.

    You’ve made some very good points and i’ll definitley have to mull over them for a while.

    Thanks!