Providing Message Pack Support to an ExpressJS Node App

Recently, the extended body-parser that added support for message pack by extending the standard body-parser was released in Mobiltron’s github page.

The module aims to replace the basic body-parser module, in order to decrease complexities for the app programmers. You just need to add the require to body-parser-with-msgpack. An example project is published in Github that exhibits the module’s usage.

var express = require('express');
var bodyParser = require('body-parser-with-msgpack');
var app = express();

// Enable mime-types: application/json, form-data and message pack!
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({extended: true}));
app.use(bodyParser.msgpack());

app.post('/data', function (req, res) {
  res.status(200).json({
  	message: 'success',
  	data: req.body
  });
});

app.listen(8080, function () {
  // Server started! [...]
});

The above example creates a POST route /data that accepts application/json, application/x-www-form-urlencoded, and application/x-msgpack mime types. The result is always returned as a Javascript object in req.body.

You can try it for yourself by cloning the Github repository.

Enjoy!

Domain-Specific Languages: A Small Introduction

Domain-specific languages (DSLs), also known as micro- languages or little languages, are programming languages designed to focus on a particular domain. Well-known DSLs include regular expressions, markdown, extensible markup language (XML), and structured query language (SQL). General-purpose languages (GPLs) have a wider scope. They provide a set of processing capabilities applicable to different problem domains. Mainstream GPLs are Java, C/C++, Python, and Scala.

To better understand the differences between DSLs and GPLs, consider the following example. The C program- ming language is a GPL. It provides the basic forms for abstractions and computation. What happens if someone wants to define a matrix of integers in C? An array of pointers must be declared like the following:

int **matrix;

To access the values of the matrix, the programmer will have to write complex pointer arithmetic statements. If one attempts to implement an algorithm for the multiplication of matrices, a function must be defined that accepts the two matrices as parameters and returns the result.

int **multiply(int **m_first, int **m_sec);

More advanced languages such as C++ and Java pro- vide more advanced methods to create abstractions; thus there are classes and interfaces. A typical implementation of the matrix multiplication algorithm would have created a class named Matrix with a method called multiply. For example, in Java, the code would look like the following:

public class Matrix {
   public void mupltiply(Matrix m) { ... }
}

This approach has many benefits if compared to the C version. The domain abstraction, which is the matrix, is declared directly as a type. In addition, it also contains the method multiply, which is closer to the reality of the mathematical domain.

With modern programming languages, it is easy to create complex libraries that declare and implement the abstractions of specific domains, but there is a barrier; the syntax of the language must be always used.

Consider now octave or mathematica, a language created specifically to deal with this algorithm implementation. These DSLs are used massively for simulations and mathematical modelling. Does anyone consider mathematica’s language to develop a web server or a database management system? Those languages focus on the mathematical domain only. Outside it, they have no meaning. The languages of mathematica and octave are DSLs.

The rest of this entry is structured as follows; first, a brief glimpse on DSL advantages and disadvantages is presented, along with a basic terminology. Three popular DSLs are also presented, with small practical examples of their use. The next section emphasizes on DSL design and implementation patterns. Finally, the entry concludes with the analysis of various works on programming language embeddings and the basic elements on how all these meth- ods can be combined to enhance the overall DSL design and implementation process.

Rest of the entry can be found in the latest volume (2016) of Encyclopedia of Computer Science and Technology (Taylor & Francis)

The Thesis

All PhD candidates around the world know about the thesis. You always knew about the thesis. It marks the beginning of the end for your career as a PhD and if you actually do it, you can have that cool “Dr.” title that you always wanted in your business card. What is the problem then? Why it seems so frustrating when you are sitting down to do it? The following is based on a true story, actually my story. How I managed to write it down and track my progress.

Problem Definition

A typical PhD follows a simple process: read, think, propose, publish, and the thesis. It is straightforward and one can imagine that if you are already there with the rest of the stuff, the write up would be rather easy. But it is not.

The problem lies, mostly in that writing the thesis is a lengthy and lonely act. You have to do it, nobody will come to your aid, except maybe from your advisor.

In my case, I faced the following problem; for quite some time, I could not motivate myself to write it down. I began writing and half page later, I always stopped. I tried everything, but nothing seemed to motivate me. My advisor got uncomfortable and we began talking about a method to track my progress that would motivate me.

The Idea

Then I saw it, Georgios Gousios’s Thesis-o-meter (see link below). This was a couple of scripts that posted every day the progress of the PhD in each chapter. I decided to do it myself, introducing some alterations that would work better for me.

First, I had to find a tangible way to measure the progress. I thought that was easy, the number of pages. The number of pages of a document is nice, if you want to measure the size of the text, but surely it cannot act as a day-to-day key performance indicator (KPI). And why is that? Because simply if you bootstrap your thesis in LaTeX and you put all the standard chapters, bibliography, etc you will find yourself with at least 15 pages. So, that day I would have an enormous progress. The next day, I would write only text. I think one or two pages. The other day text and I would put on some charts. This will count as three of four pages. Better huh? This is the problem.

If you are a person like me, you could add one or two figures, and say “Ok, I am good for today, I added two pages!”. This is a nice excuse if you want to procrastinate. I needed something that would present the naked truth. That would make me sit there and make some serious progress.

So, number of pages was out of the question, but I thought that we can actually use it. The number of pages will be the end goal with a minimum and a maximum. In Greece, a PhD usually has 150 to 200 pages length (in my discipline of course, computer science). So, I thought, this is the goal: a large block of text around those limits.

Then I thought that my metric should be the number of words in the text instead of the number of pages. Since, I wrote my thesis in LaTeX, I just count the words for each file with standard UNIX tools, for example with the command wc -l myfile.tex. So, the algorithm has the following steps:

  • The goal is set to 150-200 pages in total
  • Each day,
    • Count the words for all files
    • Count the pages of the actual thesis file, for example the output PDF
    • Find the word contribution for that day just by subtracting from the previous’s day word count
    • Find an average of words per number of pages
    • Finally, provide an estimation for the completion of the thesis

Experience Report

I implemented this in Python and shell script. The process worked, each day a report was generated and sent to my advisor, but the best thing was that each day, I saw the estimation trimmed down a little. This is the last report I produced:

10c10
     1899 build/2-meta-programming.tex
13c13
     1164 build/3-requirements.tex
60,61c60,61
<    13931 build/thesis.bib
    14058 build/thesis.bib
>    55747 total

---- Progress ----
Worked for 167 day(s) ... 
Last submission: 20121025
Word Count (last version): 55747
Page Count (last version): 179
Avg Words per Page (last version): 311
Last submission effort: 142

---- Estimations ----
Page Count Range (final version): (min, max) = (150, 200)
Word Count Range (final version): (min, max) = (46650, 62200)
Avg Effort (Words per Day): 184
Estimated Completion in: (min, max) = (-50, 35) days, (-2.50, 1.75) months
Estimated Completion Date: (best, worst) = (2012-08-11, 2012-12-16)

The average words per page was 311 and I wrote almost 184 words each day.

Epilogue

I wrote my thesis, but I have not submitted it (at least now, but I hope to soon), for a number of practical reasons. Still, the process succeeded, I found my KPIs and they actually led me to finishing up the work. This is a fact and now I have to find another motivation-driven method to do the rest of the required stuff. C’est la vie.

Related Links and Availability

I plan to release an open source version of my thesis-o-meter in my Github profile soon. I also found various alternative thesis-o-meters:

Original post can be found in XRDS blog

Language Bureaucracy

Laziness, impatience and hubris are the three great virtues that each programmer should have, at least according to Larry Wall [1]. My experience so far showed me that he was right. All programmers have these characteristics, if they do not, usually they are not realprogrammers. Since they are expressing these values with the usage of several programming languages, they tend to compare them. Usually this comparison ends up with a phenomenon called flame wars. The programmers are participating in endless quarrels, exchanging arguments regarding language features, their standard (or not) libraries, etc.

In my academic and professional life, I participated in various conversations of that kind, ending up talking hours regarding the coolness of C and C++, the inconsistencies of PHP and the easiness of Python programming. I know; we all fought our holy wars.

Almost two years ago, I co-authored a publication with my PhD advisor Dr. Spinellis andDr. Louridas, where we conducted an experiment that involved several programming languages [2]. The experiment involved the creation of a corpus, that included simple and more complex algorithmic problems, each one implemented in every language. We used back then as our primary source of code the Rosetta Code repository. The corpus is available on Github [3].

Then it occurred to me, I could use this code, to finally find out, which language is more bureaucratic. How could one measure that and what does one mean with the term bureaucratic in a programming language context?

Terms & Hypothesis

The answer is simple; we measure the LoC (Lines of Code), which are the lines of executable code of a computer program. Since all programs perform identical tasks, we directly compare the LoC for each language and the one with the fewer lines wins. At least this is a straightforward method to do it.

The contestants were nine (9) of the most popular [4] programming languages; Java, C, C++, PHP, C#, Python, Perl, Javascript, and Ruby. Fortran and Haskell are excluded because many tasks were not implemented in these two languages (we are working on that).

The selected tasks were 72 and varied from String tokenisation algorithms to anagrams and  Horner’s rule implementation for polynomial evaluation.

Counting the LoC

The following graph illustrates the total LoC for all tasks per language:

total-loc

It seems that the Python is the big winner with only 892 LoC for all the tasks and C is the big loser with 2626 LoC. It seems that if we could divide the languages in two categories, statically typed and dynamic, the latter are winning, at least on the program size front. Dynamic languages count in total 5762 lines while the static languages have combined LoC around 9237, almost double the size.

Counting the Winners

In addition let’s examine, which languages won the first place (had the minimum LoC) across all tasks. The following figure illustrates the number of wins for each language.

wins-language

 

One may notice that the actual sum of all tasks is 68, instead of 72. The remaining four were won by Fortran, which was excluded from this experiment. Since, the repository is still very active, and many of the tasks are re-organised and re-implemented to better suit the ongoing research process, I never sanitised the data, thus there may be possible errors. I do not think that the results were affected though, since the dynamic languages won the match and dominated the statically typed languages.

While writing this blog entry I consulted two friends of mine, which provided two very interesting aspects. Dimitris Mitropoulos suggested to also take into account the character count for each line and George Oikonomou that suggested applying various voting systems [5] on the language ranking for each task, thus finding the real winner.

I considered both approaches, and I think that they would produce interesting results, but first I wanted to sanitise the data set more and better examine quality attributes of the code.

References

[1] Larry Wall, Programming Perl, 1st Edition, O’Reilly and Associates
[2] Diomidis Spinellis, Vassilios Karakoidas, and Panagiotis Louridas. Comparative language fuzz testing: Programming languages vs. fat fingers. In PLATEAU 2012: 4th Annual International Workshop on Evaluation and Usability of Programming Languages and Tools–Systems, Programming, Languages and Applications: Software for Humanity (SPLASH 2012). ACM, October 2012.
[3] https://github.com/bkarak/fuzzer-fat-fingers
[4] Ritchie S. King. The top 10 programming languages. IEEE Spectrum, 48(10):84, October 2011.
[5] http://en.wikipedia.org/wiki/Positional_voting_system

Original Post can be found on XRDS