Language Bureaucracy

Laziness, impatience and hubris are the three great virtues that each programmer should have, at least according to Larry Wall [1]. My experience so far showed me that he was right. All programmers have these characteristics, if they do not, usually they are not realprogrammers. Since they are expressing these values with the usage of several programming languages, they tend to compare them. Usually this comparison ends up with a phenomenon called flame wars. The programmers are participating in endless quarrels, exchanging arguments regarding language features, their standard (or not) libraries, etc.

In my academic and professional life, I participated in various conversations of that kind, ending up talking hours regarding the coolness of C and C++, the inconsistencies of PHP and the easiness of Python programming. I know; we all fought our holy wars.

Almost two years ago, I co-authored a publication with my PhD advisor Dr. Spinellis andDr. Louridas, where we conducted an experiment that involved several programming languages [2]. The experiment involved the creation of a corpus, that included simple and more complex algorithmic problems, each one implemented in every language. We used back then as our primary source of code the Rosetta Code repository. The corpus is available on Github [3].

Then it occurred to me, I could use this code, to finally find out, which language is more bureaucratic. How could one measure that and what does one mean with the term bureaucratic in a programming language context?

Terms & Hypothesis

The answer is simple; we measure the LoC (Lines of Code), which are the lines of executable code of a computer program. Since all programs perform identical tasks, we directly compare the LoC for each language and the one with the fewer lines wins. At least this is a straightforward method to do it.

The contestants were nine (9) of the most popular [4] programming languages; Java, C, C++, PHP, C#, Python, Perl, Javascript, and Ruby. Fortran and Haskell are excluded because many tasks were not implemented in these two languages (we are working on that).

The selected tasks were 72 and varied from String tokenisation algorithms to anagrams and  Horner’s rule implementation for polynomial evaluation.

Counting the LoC

The following graph illustrates the total LoC for all tasks per language:

total-loc

It seems that the Python is the big winner with only 892 LoC for all the tasks and C is the big loser with 2626 LoC. It seems that if we could divide the languages in two categories, statically typed and dynamic, the latter are winning, at least on the program size front. Dynamic languages count in total 5762 lines while the static languages have combined LoC around 9237, almost double the size.

Counting the Winners

In addition let’s examine, which languages won the first place (had the minimum LoC) across all tasks. The following figure illustrates the number of wins for each language.

wins-language

 

One may notice that the actual sum of all tasks is 68, instead of 72. The remaining four were won by Fortran, which was excluded from this experiment. Since, the repository is still very active, and many of the tasks are re-organised and re-implemented to better suit the ongoing research process, I never sanitised the data, thus there may be possible errors. I do not think that the results were affected though, since the dynamic languages won the match and dominated the statically typed languages.

While writing this blog entry I consulted two friends of mine, which provided two very interesting aspects. Dimitris Mitropoulos suggested to also take into account the character count for each line and George Oikonomou that suggested applying various voting systems [5] on the language ranking for each task, thus finding the real winner.

I considered both approaches, and I think that they would produce interesting results, but first I wanted to sanitise the data set more and better examine quality attributes of the code.

References

[1] Larry Wall, Programming Perl, 1st Edition, O’Reilly and Associates
[2] Diomidis Spinellis, Vassilios Karakoidas, and Panagiotis Louridas. Comparative language fuzz testing: Programming languages vs. fat fingers. In PLATEAU 2012: 4th Annual International Workshop on Evaluation and Usability of Programming Languages and Tools–Systems, Programming, Languages and Applications: Software for Humanity (SPLASH 2012). ACM, October 2012.
[3] https://github.com/bkarak/fuzzer-fat-fingers
[4] Ritchie S. King. The top 10 programming languages. IEEE Spectrum, 48(10):84, October 2011.
[5] http://en.wikipedia.org/wiki/Positional_voting_system

Original Post can be found on XRDS

Top Ten Quotes (Republished)

Here is a list of my top ten favorite quotes:

  1. Commander, you know everything about your stone garden. But clearly, you have not spent nearly enough time looking at it.
    Delenn to Sinclair in Babylon 5:”The Gathering”
  2. And so it begins.
    Kosh (to Sinclair), “Chrysalis”
  3. I am a scientist. Nothing shocks me.
    — Indiana Jones in Indiana Jones and the Temple of Doom
  4. A programming language is for thinking of programs, not for expressing programs you’’ve already thought of. It should be a pencil, not a pen.
    — Hackers and Painters, Paul Graham
  5. Win, lose or draw, this thing’s going to know it was in a fight.
    — Garibaldi in “Infection”, Babylon 5
  6. Our situation has not improved.
    — Henry Jones, Indiada Jones and the last crusade
  7. Make everything as simple as possible, but not simpler.
    — Albert Einstein (1879-1955)
  8. We are not retreating – we are advancing in another Direction.
    — General Douglas MacArthur (1880-1964)
  9. Research is what I’m doing when I don’t know what I’m doing.
    — Wernher Von Braun (1912-1977)
  10. Wit is the epitaph of an emotion.
    –Friedrich Nietzsche

Empty Main Pattern (republished)

I always liked to play with static initializers in Java. I remember back in the summer of 2003 i accidentally discovered the empty main pattern. Later on, a very close friend of mine, Kostantinos Saidis also found the no main pattern.

I don’t know if its bug or feature. but i sure know its fun! 🙂

Following the examples of …

Normal Class

public class Main {
	public static void main(String[] args) {
		System.out.println("Boom");
	}
}

Empty Main Class

public class Main {
	static {
		System.out.println("Boom!!");
	}

	public static void main(String[] args) {
		//do nothing
	}
}

No Main Pattern

public class Main {
	static {
		System.out.println("Boom!!");
	}
}

Integer Class,auto-boxing and Caching (republished)

Today, i decided to search a little the latest j2sdk sources. My initial search began with the usage of final modifier in the signature of a method. What is its exact usage, etc.

I wrote some sample programs and decompiled them, finding nothing of interest. Then i tried to investigate auto-boxing. I wrote a sample program that calls a method that has an Integer class as parameter. Something like that:

public void bar(Integer i) { ... }
	
int i = 1;
System.out.println(i);

When i used auto-boxing, the compiler used the Integer.valueOf(int) method. So i started searching if it is more optimized to use auto-boxing, or not.

When i changed the i variable to Integer type, and called normally the constructor of the Integer class, i realized that the constructor was invoked normally, with int as parameter.

The mystery solved when i read the original Integer class sources.

The contructor simply instantiates a new Integer object. On the other hand, the Integer.valueOf(int) does the following:

public static Integer valueOf(int i) {pre
	final int offset = 128;
	if (i >= -128 && i <= 127) { // must cache 
		return IntegerCache.cache[i + offset];
	}
	
	return new Integer(i);
}

The valueOf method uses an cache object and keeps 256 integers cached all the time. The pre for the integer cache follows:

private static class IntegerCache {
	private IntegerCache(){}
	
	static final Integer cache[] = new Integer[-(-128) + 127 + 1];

	static {
		for(int i = 0; i < cache.length; i++)
			cache[i] = new Integer(i - 128);
	}
}

So, if you use auto-boxing (or simply use the Integer.valueOf() method) instead of the normal constructor, you get a cached integer reference instead of a new object. This is an documented feature (see the Integer.valueOf(int) javadoc entry), and is demostrated by the following code.

public class Foo {
	public static void main(String[] args) {
		Integer i = new Integer(10);
		Integer ii = new Integer(10);
		Integer iii = Integer.valueOf(10);
		Integer iiii = Integer.valueOf(10);

		System.out.println("i == ii - " + (i == ii));
		System.out.println("ii == iii - " + (ii == iii));
		System.out.println("iii = i - " +  (i == iii));
		System.out.println("iiii = iii - " + (iiii == iii));
	}
}

Sample executiion output follows:

nefarian:~/devel/ew bkarak$ java Foo
i == ii - false
ii == iii - false
iii = i - false
iiii = iii - true

But the caching works only with the Integer.valueOf(int) method. See the implementation of the Integer.valueOf(String,int) for example:

public static Integer valueOf(String s, int radix) 
	throws NumberFormatException {
		return new Integer(parseInt(s,radix));
}

This method always returns a new Integer object. Doh!

The Zen of Multiplexing (republished)

I always argue with network administrators. They use firewalls to block and monitor the traffic of the intranet and refuse to open any other port except 80 (of course) for any other service.

Ports in TCP and UDP exist for the sole reason for multiplexing many virtual channels on one physical medium. Those guys (and many others) seem to forget that and re-invent multiplexing techniques on higher layers, using web services for example (See Figure below).

How can you stay in the market? Say the same technology again and again.
How can you stay in the market? Sell the same technology again and again.

Original Post