Blog-Archiv

Dienstag, 12. Januar 2010

Scala Considerations


Note

This is not a Scala tutorial. If you don't have the time & money to buy Odersky's "Programming in Scala" you can read the O'Reilly "Programming Scala" online at http://programming-scala.labs.oreilly.com/index.html. Additionally there are a lot of articles and documents on the Scala homepage at http://www.scala-lang.org/node/1305 and http://www.scala-lang.org/node/960

Scala is a new programming language introduced by the computer language specialist Martin Odersky. He was strongly involved in Java language development and also wrote one of the newer versions of the Java compiler. He not just took the Java ideas further, he even integrated best practices from functional programming, a part of the IT world that is lesser known. The Scala compiler builds Java class files, but code can be compiled for the .NET platform, too.

So Scala is a hybrid language between object-oriented and functional programming, between imperative and declarative principles. Scala is short and concise, it removes the necessity for a lot of object-oriented boiler-plate (purely technical) code, and thus it downsizes source files (the less code the less mistakes!). Encapsulation and separation of concerns are promoted, though not being enforced.

Due to the integration of functional ideas the object-oriented developer might have to learn some new terms when working with Scala. Closures, currying, partial functions, lambdas, monads, continuations, folding, reducing, ... although described on Wikipedia the normal programmer will not easily figure out what this is needed for (explanations are given mainly in a very scientific and intrinsic functional style).

To me Scala seems to be a little like atom power. It is seductive, but dangerous. The question rises whether people will apply it or abuse it. Remember C++ and Perl. Such languages can lead to lots of unreadable source code. The problem with Scala is not so much cryptography but the many internal rules you need to know for understanding Scala code, although its advanced form (DSL = domain specific language) might look very natural to a domain expert.

Java was a great idea, simplicity and platform-independence were strong requirements at the time when it was introduced. Nowdays new requirements are awaiting. One of them is that everywhere the size of source code is so big. We need to write less and do more. Another one is concurrent programming, which becomes more and more important. Remember the long struggle Java fought for achieving platform independent multi-threading and truely synchronized access (e.g. the lazy initialization bug). And remember how long it takes to write well working multi-threaded code in Java (java.util.concurrent packages have been available only since 1.5). Although Scala builds on Java there is no synchronized keyword in Scala, a new way to solve such things has been introduced, called Actors, adopted from Erlang, a functional language a little older than Java. Scala claims to be scalable, which also means that Scala applications can use several processors effectively, without having to rewrite code. Great expectations!

Impressive Ideas, Big Dangers

My first response to Scala was "No":

  • A playground for hackers producing unreadable code. Each of them will have its own domain-specific dialect, the domain being the programmer. Operator overloading was one of the most declined C++ constructs. Ok, I admit that Scala has no operator overloading, but allowing symbols as method names has the same effect. Remember the Perl principle - "you should be able to do it in different ways" - is this really good for source code maintained by many people who have individual ways to read and write code? I mean, don't we need languages that rather enforce a standard way to express things unambiguously?

  • Scala comes with an abbreviation style that was always discouraged by Java/Sun. Long names are much more significant. Why "mkString" and not "makeString"? Why "Elem" and not "Element"? Why "val" and not "constant"? Why "var" and not "variable"? Why "def" and not "method"? Why "AnyRef" and not "AnyReference"? Why "r" and not at least "toRegExp"? And, by the way, why "object" and not "singleton"?

  • The usage of switch-cases is encouraged by a really mighty language construct, the match-case clause. But switch-case is an anti-pattern in OO languages. Such code should be done by introducing subclasses that each imlement one of the cases. Even the usage of the class-information (Java instanceof operator) is an anti-pattern - such code belongs to the class itself. Moreover Scala tutorials encourage the use of the default _ case, instead of allowing an exception to be thrown when an object could not be classified (which surely indicates a programming mistake).

  • Too many symbols, too little words. Method names like /: (alias foldLeft) are hard to read and will encourage developers to write further cryptic code that uses symbols instead of names.

  • API documentation is missing or insufficient because of being too intrinsic.

  • The default access modifier for methods is public instead of private, the default access modifier for classes is public instead of package-visible. When declaring a constructor parameter as var (variable), its associated member field is publicly visible (this is an anti-pattern, an object's internal state should not be manipulable from outside).

  • For example, backslash \ is a legal name for a Scala variable: who needs this?

  • Scala is too big (even when the number of keywords is small). To get familiar with it might take a month even for a professional.

  • Until now no best practices for writing Scala code have been published.

My second reaction was "Wow":

  • Scala has only few operators, e.g. arithmetic symbols like + or / and even == are methods in Scala, not operators. And the usage of dot between object and method/field is optional for fields and methods with just one parameter, and so are trailing braces (person.speak() == person speak). So Scala actually is just a construction set for domain-specific languages (what arithmetics are in fact). Yes, I know, this also was enlisted above as bad practice, but maybe we will have to face new aspects of programming in future (separation of technical and domain-specific code).

  • Coupling object-oriented and functional programming is a demand of time. We need to get rid of those tons of fragile cross-linked boiler-plate thread-unsafe spaghetti code.

  • The elegance is convincing. No more annoying constructor code with this.x = x. In Scala I can separate the body of a loop from the loop implementation. I do not even need to write the loop, collection iterations are already implemented, it is enough to prepare the loop body as an anonymous function (or closure). Remember writing so much finder methods in Java, all duplicating the loop control code.

  • Methods are objects and can be passed to other methods; the uniform access principle makes no difference between fields and methods.

  • Scala is statically typed. This is a MUST for bigger projects.

  • The so frequently used design patterns Singleton and Factory have been provided elegantly in the language.

  • No more statics! This was a major drawback for software reusage and component design.

  • No more break and continue! This obstructed loop code for a really very long time now.

  • No more for (int i = 0; i < limit; i++) loops! Java adopted this from C, but it makes loop implementations fragile because the i variable is visible and mutable inside the loop.

  • No more i++ or ++i statements! Remember the statement array[++i] = value; I really did not like to know what index actually is changed then.

  • No more mandatory ";" at the end of a statement, this is optional in Scala.

  • No more "boiler-plate" definitions like final Integer i = new Integer(1), it is val i = 1 now! Scala performs type inference, and additionally it has no more primitives, everything is an object.

  • The access modifier mechanism is much more powerful than that of Java (I ignore the fact that a lot of programmers do not use access modifiers for the expressiveness of their code). You can use private and protected keywords on many levels and places. Scope specifications help to adapt that access control in special cases. There is var to declare a variable and val to declare a constant. The sealed keyword can restrict the number of subclasses to one Scala source file, a feature that can not be found in Java. Scala access rights are a little complicated, but more accurate than Java's ones.

  • You can embed XML literally into Scala source files, the compiler switches to XML mode when finding such. Scala claims to be an alternative to XSLT!

  • Scala code can be compiled to the .NET platform!

So take together "No" and "Wow" and you get "Now" - but wait, I am not yet sure.

As you see the arguments for and against Scala are contradictory in some points. Dangers are overwhelming, elegance is seductive. And so is Scala. For example it provides multiple inheritance with traits (like Java interfaces, but can contain implementations), and there are many really good pros and cons about this (Scala's motto in this question seems to be: the last wins!).

Readability

A major criterion for a language's survivability might be:

wiIl it be used in literature, for example to describe design patterns?

Remember that books were rewritten with Java samples as soon as it turned out that Java's notation is more comprehensible than C++. Books make up history. Simplicity is an agile principle, called KIS (Keep It Simple). Simplicity means accessability for more people, and thus it means productivity.

Is Scala source code intuitive and easy to understand? After having tried to read some of the samples Scala ships with I said "No". But maybe I have to learn a little more about encapsulation and separation of concerns. Functional applications are definitely more robust and less error-prone than OO programs. Only their readability has a bad reputation.

So what do I understand when seeing code like

val thrill = "Will" :: "fill" :: "until" :: Nil

As you can see you need no knowledge of Scala libraries to understand this. Only a programmer might worry.

  • Nil denotes a new empty list instance

  • the "::" is a List method which returns a new List with the argument at the head of it and itself as tail

  • because "::" ends with an ":" it is a left associative method and thus the receiver is Nil and its argument is "until"

  • the dereferencing "." between receiver and method is optional for methods with just one argument, and so are parentheses

  • the returned List containing "until" takes the next argument "fill", finally resulting in a List that contains in order "Will", "fill", "until" - Hey, that's what is written here!

No question that this notation is elegant. To know Scala is to love Scala ... Maybe we have to read this just intuitively, without wanting to understand what happens technically? However, a little boiler-plate "noise" remains with the :: Nil.

There are other examples, better ones and worse ones. Here is a quite cryptic method implementation I found on the web:

def sum(list: List[Int]) = (0 /: list) {_ + _}

Understand? No? So here is the same in a more readable way:

def sum(list: List[Int]) = list.foldLeft(0)(_ + _)

You don't understand this either?

def sum(list: List[Int]): Int = list.foldLeft(0)((a: Int, b: Int) => { a + b })

Now everything is clear. No? You need to love it more and learn what folding is, and what an anonymous function is! By the way, foldLeft is tail-recursive and thus very fast, unlike foldRight ... there is really a lot a programmer must know about Scala. Anyway, here comes the most readable version of this, in a Java-like style.

def sum(list: List[Int]): Int = {
def sumOf2(a: Int, b: Int): Int = {
return a + b
}
return list.foldLeft(0)(sumOf2)
}

As one can see this method sums up all Integers in the passed List, starting with amount 0, and returns the result. The inner function sumOf2 is passed to the folding method (which is a "curried" method). As we see, "write less, do more" has its price.

All in all this was a silly example, only to show potential source cryptography. In practice one would use "reducing" for summing up a List (as long as the start amount is not required):

def sum(list: List[Int]) = list.reduceLeft(_ + _)

Thank God there is no symbol-style sibling for reduceLeft like the one above for foldLeft.

Domain Expert Chewing Gum

Scala features the adaption of the language to things you want to express without "noise" (boiler-plate code), e.g. you can define methods named "+" and "-" to implement arithmetic expressions (this is also a DSL). Then someone can write arithmetic expressions using the Scala interpreter directly. DSLs are for experts that are not programmers but want to solve use-cases of their domain by writing text. They want to work with the terms and symbols of their domain, and they want to use only a minimum of technical "noise".

For Example: Search Expressions

Web search expressions are an example for a DSL that must be interpreted at runtime, it is a so-called external DSL. The Altavista web search engine provided such expressions. The user inputs e.g.

+Steinbeck John -(Charlie OR James)

which means that the result pages MUST contain "Steinbeck", CAN contain "John", and MUST NOT contain "Charlie" or "James".

When using this as so-called internal DSL we could wrap the input text into a source template and let the Scala script interpreter run over it (this generates class files and executes them after). Maybe that runtime-generated source would look like the following:

/** Just a fictive Scala DSL */
class AdhocQuery extends Application with SearchEngine {
val query = super.search +Steinbeck John -(Charlie OR James)
println(query)
}

I do not know if a Scala expert could implement such a DSL. Never say never!

Following are the tricks that Scala DSL designers apply when making an internal DSL:

  • the fact that method parentheses are optional for methods with just one parameter

  • the fact that the dot between receiver and field or method is optional and can be replaced by space(s)

  • implicit conversions (allows methods to be called upon types that do not provide such a method, see String / RichString)

  • currying (allows parameter lists to be splitted and thus helps avoiding parentheses)

  • the "fluent interface": a method returns an object upon which you can call another method, that returns an object upon which ... (Hm, the nightmare of Adaptive Programming / Law Of Demeter)

  • anonymous function bodies

  • singleton factories with apply methods

Even when this example might be too complicated for an internal DSL, Scala has tools for scanning, parsing and traversing the syntax tree of an external DSL. The API contains a lot of packages starting with scala.util.parsing, and there are samples in the examples/parsing folder. External DSLs can be implemented very elegantly with these libraries. Again you need to know a lot about these things. The resulting code might look like this then:

val input = textfield.getText  // reads the input String +Steinbeck John -(Charlie OR James)
val syntaxTree = new WebQueryParser().parse(input)
println(syntaxTree.evaluate(new SearchEngine()))

Separation of Technical and Domain Code

After having sniffed a sample DSL implementation I came to a nicer definition for Scala than atom power: Scala is like chewing gum. Using Scala you can express things in so many different ways that the language below feels like chewing gum that just sticks together everything. A language construction set.

This opens an interesting aspect. Traditionally the programmer is the man-to-machine-translator for the domain expert. The expert specifies, the programmer codes. With Scala these roles would change. The programmer would provide a Scala DSL, and the expert would specify the domain's use-cases himself by means of that DSL.

This would promote the separation of technical and domain-specific source code, which is a hushed-up but very important topic. Until now there were no real means to do such. All programmers wrote classes like "HouseBuilder" where "House" is the domain term and "Builder" is the technical term. The class methods contained business logic as well as technical logic. When the separation of these two really is possible with Scala, this could cause a kind of IT-social (r)evolution.

Readability Revisited

Back to the question from before: Will Scala be used in literature? So, which part of the Scala code will be printed in that book, the technical or the domain-specific one? You can not understand Scala DSL code without having studied the associated implicits and singleton factories before, and this is VERY technical code. The problem already starts with Scala collections: folding and reducing is functional domain code. However, Scala code could be read intuitively, and books that deal with a certain domain might prefer such a solution while listing the overall Scala glue code in the appendix.

Already there are a lot of Scala libraries around that ease Scala development. This world is going to be as big as Java or Ruby worlds. One of them is the implementation of a DSL for behavior-driven-design, which features textual specifications embedded within test-cases (taken from test-driven-design). When having a look at a sample code we get a glimpse of the fact that readability has at least two aspects:

  • Can the programmer read it?

  • Can the domain expert read it?

import org.specs._   // here we find the specs DSL code

object ComplexSpec extends Specification {
"Complex addition" should {
"return a new Complex made up from real' and imaginary' parts sums" in {
val c1 = Complex(1.2, 3.4)
val c2 = Complex(5.6, 7.8)
(c1 + c2).real mustEqual (c1.real + c2.real)
(c1 + c2).imaginary mustEqual (c1.imaginary + c2.imaginary)
}
}
}

For the domain expert this is clear. The programmer will have to learn that this DSL contains an implicit conversion from String to some specs-class that provides "should" and "in" methods that take function literals as arguments ...

Coding Scala will be either designing a DSL or using a DSL. Scala itself is just the chewing gum in between. Most of the currently available Scala libraries (like the "specs" above) come as DSL.

Scala--

The question rises if Scala is for the masses, like Java was SmallTalk for the masses. Do we really have to convert to functional programming? Isn't an add() better than the :: to build a collection? Does it really make sense to provide DSL capacities, doesn't that make everything even more difficult?

Learning Scala soon made me think of something like Scala--, a Scala suitable for the masses. I want the elegance, I want safe collections, I want separation of concerns and flexibility, I want scalability, but what about the pending abuse of the language? Murphy's law is still valid: what can be abused will be abused.

Currently Java is the dominant programming language. It has made its way by featuring simplicity and comprehensibility, and by the open-source community providing masses of libraries. Nevertheless it had turned out that Java has design-flaws.

  • Untyped collections were fixed with 1.5 that introduced Java Generics for type-safety, but this has become rather complex.

  • Platform-independent multi-threading took years to get to maturity, the according API was not available until 1.5.

  • Closures are a pending feature for 1.7, but the functional programming aspect will hardly come into Java.

  • I saw the problems programmers run into when trying to avoid copy & paste code, actually reusable code is still hard to write in Java, it mostly ends up in static implementations.

  • Writing a robust singleton in Java is expert work (lazy allocation, synchronization, serialization).

  • Code contains too much boiler-plate technical things, there is no type inference.

  • Knowledge about anti-patterns are absolutely necessary for ambitious Java projects. Think of DontCallOverridableMethodsFromConstructor which can cause hard-to-find bugs.

  • Static implementations constrain design changes and code reusage.

  • Public classes are visible even in packages that need not to see them.

  • API flaws like unmodifiable Lists that offer add(...) methods (that will crash) makes programming tiresome.

  • Dynamic class loading via reflection breaks static typing.

  • Java supports primitives, the autoboxing feature introduced in 1.5 now causes NullPointerExceptions.

  • The instanceof operator permits to work around design weaknesses.

This list is not complete. Nothing ever will be perfect. Java's advantages would make up a list that is much longer. During the passed fifteen years a learning process took place that culminated in tools like FindBugs, CPD (Copy&Paste Detection), CheckStyle, RevJava and others. They were made to show up design flaws in source code, and to enforce best coding practices. Wouldn't it be nice to have a language that itself makes it impossible to commit such flaws? I mean, there is a big amount of knowledge in these things, and for programmers it takes years to learn all of that.

Is Scala a step forward? Design patterns like Singleton and Factory have been integrated into Scala language. Functional programming has been integrated. The NullObject pattern is supported by Maps, and many more. But what about the Visitor pattern, Adaptive Programming and The Law of Demeter? What about preventing design flaws even silly like "don't use a derived type in its super-type"? Why encouraging the usage of match-case constructs when switch-case is an anti-pattern in object-oriented programming? I do not know much about the maintainability of functional programs, but I dare to doubt it.

It looks like Scala is a programming language demanded by the change of times. It tries to master new requirements and avoid old mistakes, like Java did. Design skills still are required for Scala, even more for its DSLs. There will be a FindBugs version for Scala, too. Maybe once there will be design languages and programming languages separately.

Should we say "Bye bye interface, hello trait"? I don't know. Scala is a good choice in areas where domain specific languages play a role, and where the separation of technical and domain code is important. I will continue evaluating Scala for small projects, maybe this will evolve.