I started the optimization attempt timing the original programs, the C++ (g++ -O3 SortInt.cpp).
nefarian:~/test bkarak$ make benchmark time ./SortInt 2147480553 1716679741 1288671507 858361834 428390130 -277624 -429954625 -859625580 -1288566623 -1718347631 1.51 real 1.47 user 0.03 sys time java SortInt -2147480515 -1718341395 -1288561118 -859625208 -429952243 -269290 428394700 858362557 1288674589 1716691491 3.22 real 3.09 user 0.12 sysThe benchmarking environment is a macbook pro with Mac OS X (10.4), 2GB of RAM and a dual core 2 duo. We can initially notice that the java program has almost 100% user time and 400% system time. The initial programs are here for Java and C++.
When i did basic profiling on the Java implementation, i discovered that the 79% of the execution time was spent on the TreeSet. So, my initial approach was the replacement of the TreeSet and create a CustomSet. You can find the code here (SortIntNew) (all the code in one tar can be found here) and the benchmarking results follow:
time ./SortInt 2147480553 1716679741 1288671507 858361834 428390130 -277624 -429954625 -859625580 -1288566623 -1718347631 1.54 real 1.47 user 0.04 sys time java SortInt -2147480515 -1718341395 -1288561118 -859625208 -429952243 -269290 428394700 858362557 1288674589 1716691491 3.22 real 3.08 user 0.12 sys time java SortIntNew -2147480515 -1718341395 -1288561118 -859625208 -429952243 -269290 428394700 858362557 1288674589 1716691491 1.37 real 1.23 user 0.09 sysha! The Java implementation is now faster than the C++ one and has the same abstract characteristics (the problem found in my first approach). I believe that with a little more code tweaking we can get even more better results, but i think the point was exhibited adequately in this case.
I agree that in Java, you really have to know what you are doing to avoid pitfalls (usage of TreeSet in this case) and bad performance, but i cannot accept that C++ is faster than Java in everything ... and remember ...
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)
Post Update (10/11/2007 21:40)
Later the same day i received an e-mail from Diomidis Spinellis and George Gousios pointing out the following:
- That i "cheat" using the sort only before i get the iterator
- That the CustomSet is not a Set allowing duplicates among others
Complexity guarantees --------------------- key_comp() and value_comp() are constant time. Erase element is constant time. Erase key is O(log(size()) + count(k)). Erase range is O(log(size()) + N), where N is the length of the range. Find is logarithmic. Count is O(log(size()) + count(k)). Lower bound, upper bound, and equal range are logarithmic. Invariants ---------- 1. Definition of value_comp: If t1 and t2 are objects of type X::value_type and k1 and k2 are the keys associated with those objects, then a.value_comp() returns a function object such that a.value_comp()(t1, t2) is equivalent to a.key_comp()(k1, k2). 2. Ascending order: The elements in a Sorted Associative Container are always arranged in ascending order by key. That is, if a is a Sorted Associative Container, then is_sorted(a.begin(), a.end(), a.value_comp()) is always true.I must confess that i did not took all these parameters in mind when i designed the optimisation. Mainly because that was not the case when i began the design, that was not the problem (maybe my misunderstanding), and my optimisation is strongly coupled with the original problem. Right now, to prove my concept i have to meet all this guarantees, even though most of them are not used in the program. It is true that the most fit container to match the STL's Set is the TreeSet (that explain why he chose it). Maybe to provide a clean comparison i must write the code in C++ with a most simple container. I do not dare to use Java's default containers, cause -in my opinion- are sacrificing performance for stability and probably that is the most wise thing to do.
Even with all those in mind, i do not change my opinion, or the point i am trying to make out here in this post. To achieve Java's performance you need to program knowing a lot more than in C++. I'm damn sure, that if i give it a shot, i will probably achieve better performance than TreeSet, maybe even better than C++. I will let you know, that for sure.
I will not remove this post :). I like it here, pointing out the mess you can achieve by optimising without having all data in mind :).