Domain-Specific Languages: A Small Introduction

Domain-specific languages (DSLs), also known as micro- languages or little languages, are programming languages designed to focus on a particular domain. Well-known DSLs include regular expressions, markdown, extensible markup language (XML), and structured query language (SQL). General-purpose languages (GPLs) have a wider scope. They provide a set of processing capabilities applicable to different problem domains. Mainstream GPLs are Java, C/C++, Python, and Scala.

To better understand the differences between DSLs and GPLs, consider the following example. The C program- ming language is a GPL. It provides the basic forms for abstractions and computation. What happens if someone wants to define a matrix of integers in C? An array of pointers must be declared like the following:

int **matrix;

To access the values of the matrix, the programmer will have to write complex pointer arithmetic statements. If one attempts to implement an algorithm for the multiplication of matrices, a function must be defined that accepts the two matrices as parameters and returns the result.

int **multiply(int **m_first, int **m_sec);

More advanced languages such as C++ and Java pro- vide more advanced methods to create abstractions; thus there are classes and interfaces. A typical implementation of the matrix multiplication algorithm would have created a class named Matrix with a method called multiply. For example, in Java, the code would look like the following:

public class Matrix {
   public void mupltiply(Matrix m) { ... }
}

This approach has many benefits if compared to the C version. The domain abstraction, which is the matrix, is declared directly as a type. In addition, it also contains the method multiply, which is closer to the reality of the mathematical domain.

With modern programming languages, it is easy to create complex libraries that declare and implement the abstractions of specific domains, but there is a barrier; the syntax of the language must be always used.

Consider now octave or mathematica, a language created specifically to deal with this algorithm implementation. These DSLs are used massively for simulations and mathematical modelling. Does anyone consider mathematica’s language to develop a web server or a database management system? Those languages focus on the mathematical domain only. Outside it, they have no meaning. The languages of mathematica and octave are DSLs.

The rest of this entry is structured as follows; first, a brief glimpse on DSL advantages and disadvantages is presented, along with a basic terminology. Three popular DSLs are also presented, with small practical examples of their use. The next section emphasizes on DSL design and implementation patterns. Finally, the entry concludes with the analysis of various works on programming language embeddings and the basic elements on how all these meth- ods can be combined to enhance the overall DSL design and implementation process.

Rest of the entry can be found in the latest volume (2016) of Encyclopedia of Computer Science and Technology (Taylor & Francis)

Language Bureaucracy

Laziness, impatience and hubris are the three great virtues that each programmer should have, at least according to Larry Wall [1]. My experience so far showed me that he was right. All programmers have these characteristics, if they do not, usually they are not realprogrammers. Since they are expressing these values with the usage of several programming languages, they tend to compare them. Usually this comparison ends up with a phenomenon called flame wars. The programmers are participating in endless quarrels, exchanging arguments regarding language features, their standard (or not) libraries, etc.

In my academic and professional life, I participated in various conversations of that kind, ending up talking hours regarding the coolness of C and C++, the inconsistencies of PHP and the easiness of Python programming. I know; we all fought our holy wars.

Almost two years ago, I co-authored a publication with my PhD advisor Dr. Spinellis andDr. Louridas, where we conducted an experiment that involved several programming languages [2]. The experiment involved the creation of a corpus, that included simple and more complex algorithmic problems, each one implemented in every language. We used back then as our primary source of code the Rosetta Code repository. The corpus is available on Github [3].

Then it occurred to me, I could use this code, to finally find out, which language is more bureaucratic. How could one measure that and what does one mean with the term bureaucratic in a programming language context?

Terms & Hypothesis

The answer is simple; we measure the LoC (Lines of Code), which are the lines of executable code of a computer program. Since all programs perform identical tasks, we directly compare the LoC for each language and the one with the fewer lines wins. At least this is a straightforward method to do it.

The contestants were nine (9) of the most popular [4] programming languages; Java, C, C++, PHP, C#, Python, Perl, Javascript, and Ruby. Fortran and Haskell are excluded because many tasks were not implemented in these two languages (we are working on that).

The selected tasks were 72 and varied from String tokenisation algorithms to anagrams and  Horner’s rule implementation for polynomial evaluation.

Counting the LoC

The following graph illustrates the total LoC for all tasks per language:

total-loc

It seems that the Python is the big winner with only 892 LoC for all the tasks and C is the big loser with 2626 LoC. It seems that if we could divide the languages in two categories, statically typed and dynamic, the latter are winning, at least on the program size front. Dynamic languages count in total 5762 lines while the static languages have combined LoC around 9237, almost double the size.

Counting the Winners

In addition let’s examine, which languages won the first place (had the minimum LoC) across all tasks. The following figure illustrates the number of wins for each language.

wins-language

 

One may notice that the actual sum of all tasks is 68, instead of 72. The remaining four were won by Fortran, which was excluded from this experiment. Since, the repository is still very active, and many of the tasks are re-organised and re-implemented to better suit the ongoing research process, I never sanitised the data, thus there may be possible errors. I do not think that the results were affected though, since the dynamic languages won the match and dominated the statically typed languages.

While writing this blog entry I consulted two friends of mine, which provided two very interesting aspects. Dimitris Mitropoulos suggested to also take into account the character count for each line and George Oikonomou that suggested applying various voting systems [5] on the language ranking for each task, thus finding the real winner.

I considered both approaches, and I think that they would produce interesting results, but first I wanted to sanitise the data set more and better examine quality attributes of the code.

References

[1] Larry Wall, Programming Perl, 1st Edition, O’Reilly and Associates
[2] Diomidis Spinellis, Vassilios Karakoidas, and Panagiotis Louridas. Comparative language fuzz testing: Programming languages vs. fat fingers. In PLATEAU 2012: 4th Annual International Workshop on Evaluation and Usability of Programming Languages and Tools–Systems, Programming, Languages and Applications: Software for Humanity (SPLASH 2012). ACM, October 2012.
[3] https://github.com/bkarak/fuzzer-fat-fingers
[4] Ritchie S. King. The top 10 programming languages. IEEE Spectrum, 48(10):84, October 2011.
[5] http://en.wikipedia.org/wiki/Positional_voting_system

Original Post can be found on XRDS

The Zen of Multiplexing (Revisited)

Technologies that are related to software engineering, occasionally produce the following phenomenon; they occasionally re-inventing the wheel. Not only that, but these technologies are always presenting themselves as mind-blowing, we-are-going-to-save-the-world solutions.

First Blood, Part One: Responsive Design

There is a bunch of programmers that always designed websites that did fit quite well in smart phones and tablets. Then they decided to make it to work properly and produced tons of libraries and related technologies that focused on presenting the content correctly on all screen sizes. Ok, and they think that this is a new piece of technology and not a bug fix. They have this great proposition: “Hey, now you can see this website on all devices, and we call it now responsive design”, and we charge accordingly.

First Blood, Part Two: The Zen of Multiplexing

Not so long ago, IP Networking was invented and people thought that it would be cool to have several services multiplexed on the newly founded network data channel. They conceived the ports on the Transport Layer. Ports were numbers assigned to specific services; thus we had the 80 port for the HTTP service, 22 for secure shell (SSH), 20 and 21 for FTP etc.

So far, so good, but it seemed that our fellow software engineers were not happy with that approach, and they decided to do something else instead. First, the idea of permitting only HTTP port (80) appeared. All other ports were designated as insecure (!) and one-by-one network administrators closed them with firewalls. But the need to multiplex was still there, and the re-invention pattern became a reality. The concept was simple; all protocols (or at least many) and services were rewritten to work over HTTP.

ws-mania

What Goes Around, Comes Around

The insecure port mechanism gave its place to a new set of vulnerabilities, like SQL injections, etc. In addition, all modern approaches appeared and many services were rewritten from scratch, utilising the new paradigm.

But in reality the gain was minimal, firewalls were replaced by intrusion detection systems that made deep packet inspections, to identify if the packets transferred over HTTP was indeed legal. All protocol replaced by HTTP, thus inheriting its good and bad characteristics.

The Final Frontier

It seems that there will never be an end to this. Software engineers seem that they feel the need to reinvent the wheel (or parts of it) every now and then, enforcing their clients to pay costly software rewrites.

Of course, many of those alterations contain improvements, sometimes significant ones, but I think that all these is a matter of control. Maybe programmers think that it is better to control all aspects of the software and service multiplexing is a significant one that cannot be left to the operating system or the network to handle it. But the question is, who pays the bill?

 

Revenge of the SQL

The mainstream is always under attack. So is SQL and in general the various RDBMSs that implement its various incarnations. After the golden age of MySQL and the LAMP (Linux-Apache-MySQL-PHP) stack, various NoSQL databases appeared, preaching that the data storage as we knew it was coming to an end.

The Crusade

Everyone thought back then, that the web applications could not benefit from data with strongly enforced relations and the query language for these systems; SQL was complex and introduced overhead in terms of performance and security with the infamous SQL-injection attacks.

In addition, every programmer I know (even me) was tempted to join the crusade against the RDBMS and the SQL. At first, the problem was solved with ORMs, which mapped the database entities to specific programming language abstraction elements, such as classes. After that, millions of lines of code were written for support tools, and IDEs, all of course in the name of the crusade.

The Elusive Holy Grail

Even with all this effort and resources, the problem was in fact recursive. SQL is a Domain-Specific Language (DSL), which in plain English means that it focused on a specific problem domain. General-Purpose Languages (GPL) like Java, C++ and Python, could never offer the richness of any DSL, focused to expressed queries on data entities. Since ORMs offered abstractions on the GPLs, other query languages were designed to offer querying capabilities on the mapped objects. Programmers got rid of the SQL and the interaction with the RDBMS, but instead they had to learn new exotic variations of query languages, which in the end suffered from the same set of problems.

The NoSQL approaches were successful, but their usage is misjudged. NoSQL databases are perfect to solve specific types of problems, but they are not here to replace RDBMS systems.

The Empire Strikes Back

It seemed that the big players that provide infrastructure (IaaS/PaaS) noticed that RDBMSs are an irreplaceable piece of technology; thus they start offering turn-key solutions on their platforms, like Heroku postgres service and Google Cloud SQL.

On the language front, it seems that the problem was not the SQL, but the way the programmers interacted with it. Next generation languages like Scala and Java 8, offer libraries like Anorm, Slick and JOOQ that offer enhancements on the interaction between these languages and SQL, providing compile-time error detection in queries and out-of-the-box optimisations.

Epilogue

History shows that SQL and RDBMSs are here to stay. Of course new, related technologies were invented and will be in the future, but they will never be replaced.