Today i tried to optimize the sorting program that Diomidis Spinellis first wrote, in an attempt to provoke thoughts regarding the efficiency of Java and C++. The original article is here and a second attempt to optimize it by me. My implementation was significantly faster, but as Diomidis Spinellis stated in the second post, the second program does not have the same properties as the first one.

I started the optimization attempt timing the original programs, the C++ (g++ -O3 SortInt.cpp).
nefarian:~/test bkarak$ make benchmark
time ./SortInt
2147480553
1716679741
1288671507
858361834
428390130
-277624
-429954625
-859625580
-1288566623
-1718347631
	        
1.51 real         1.47 user         0.03 sys
	
time java SortInt
-2147480515
-1718341395
-1288561118
-859625208
-429952243
-269290
428394700
858362557
1288674589
1716691491

3.22 real         3.09 user         0.12 sys
The benchmarking environment is a macbook pro with Mac OS X (10.4), 2GB of RAM and a dual core 2 duo. We can initially notice that the java program has almost 100% user time and 400% system time. The initial programs are here for Java and C++.
When i did basic profiling on the Java implementation, i discovered that the 79% of the execution time was spent on the TreeSet. So, my initial approach was the replacement of the TreeSet and create a CustomSet. You can find the code here (SortIntNew) (all the code in one tar can be found here) and the benchmarking results follow:
time ./SortInt
2147480553
1716679741
1288671507
858361834
428390130
-277624
-429954625
-859625580
-1288566623
-1718347631

1.54 real         1.47 user         0.04 sys

time java SortInt
-2147480515
-1718341395
-1288561118
-859625208
-429952243
-269290
428394700
858362557
1288674589
1716691491

3.22 real         3.08 user         0.12 sys

time java SortIntNew
-2147480515
-1718341395
-1288561118
-859625208
-429952243
-269290
428394700
858362557
1288674589
1716691491

1.37 real         1.23 user         0.09 sys
ha! The Java implementation is now faster than the C++ one and has the same abstract characteristics (the problem found in my first approach). I believe that with a little more code tweaking we can get even more better results, but i think the point was exhibited adequately in this case.
I agree that in Java, you really have to know what you are doing to avoid pitfalls (usage of TreeSet in this case) and bad performance, but i cannot accept that C++ is faster than Java in everything ... and remember ...

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)

Post Update (10/11/2007 21:40)

Later the same day i received an e-mail from Diomidis Spinellis and George Gousios pointing out the following:
  1. That i "cheat" using the sort only before i get the iterator
  2. That the CustomSet is not a Set allowing duplicates among others
In addition, Diomidis Spinellis listed the complexity guarantees of the STL's Set container, which are:
Complexity guarantees
---------------------
key_comp() and value_comp() are constant time.

Erase element is constant time.

Erase key is O(log(size()) + count(k)).

Erase range is O(log(size()) + N), where N is the length of the range.

Find is logarithmic.

Count is O(log(size()) + count(k)).

Lower bound, upper bound, and equal range are logarithmic.

Invariants
----------
1. Definition of value_comp: If t1 and t2 are objects of type X::value_type and k1 
and k2 are the keys associated with those objects, then a.value_comp() returns 
a function object such that a.value_comp()(t1, t2) is equivalent to a.key_comp()(k1, k2).

2. Ascending order: The elements in a Sorted Associative Container are always 
arranged in ascending order by key. That is, if a is a Sorted 
Associative Container, then is_sorted(a.begin(), a.end(), a.value_comp()) is always true.
I must confess that i did not took all these parameters in mind when i designed the optimisation. Mainly because that was not the case when i began the design, that was not the problem (maybe my misunderstanding), and my optimisation is strongly coupled with the original problem. Right now, to prove my concept i have to meet all this guarantees, even though most of them are not used in the program. It is true that the most fit container to match the STL's Set is the TreeSet (that explain why he chose it). Maybe to provide a clean comparison i must write the code in C++ with a most simple container. I do not dare to use Java's default containers, cause -in my opinion- are sacrificing performance for stability and probably that is the most wise thing to do.

Even with all those in mind, i do not change my opinion, or the point i am trying to make out here in this post. To achieve Java's performance you need to program knowing a lot more than in C++. I'm damn sure, that if i give it a shot, i will probably achieve better performance than TreeSet, maybe even better than C++. I will let you know, that for sure.

I will not remove this post :). I like it here, pointing out the mess you can achieve by optimising without having all data in mind :).