Blog-Archiv

Samstag, 12. Mai 2018

Why Short Is Not Always Good

Yes, I know, "TL;DR;" means "Too Long, Didn't Read". Kilroy is everywhere. Nevertheless, this long long long Blog is about the (still famous?) saying

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”
Now they have two problems.

which has been exhaustively discussed on Coding Horror and Jeffrey Friedl's Blog. Here is a history of this epoch-making joke template:


Whatever you solve by regular expressions will break soon, because they are imprecise assumptions, mostly unreadable. The best explanation of the problem with regex on Jeffrey's Blog was

It’s hard to distinguish between the data and operators.

That's the point, well said!
By the way, doesn't that somehow also fit to C++ operator overloading and Scala symbols as function names?
To me it looks like the common goal of all these constructs is to make source code shorter!

Less Code Less Bugs?

The shorter the code, the fewer mistakes are inside.

This may be true in some cases. In other cases the mistakes will not be less, just sit deeper, being more invisible, much harder to find. Especially when the code builds upon implications that are known to experts only, like it is with Domain Specific Languages. Regular expressions is a DSL, and many dialects of it exist.

But to achieve shorter code you need programmers that avoid copy & paste coding and agree on shared code, not regular expressions or a better general programming language!

There is also an addiction to use abbreviations and acronyms everywhere (see jQuery API). Lazyness? Need for speed? In old times there were reasons for shortness, memory and disk space was low. Today these restrictions are gone. A short class-, field-, function- or variable-name is much more error-prone than a long one, because it will not be understood, or misunderstood. It's time to learn the art of being precise without being too long. Don't slaughter words, use them.

Write Less Do More?

This is the slogan of jQuery, a JavaScript library that (besides many other things) provides CSS expressions to access DOM elements in an HTML document, something that HTML5 meanwhile also provides via element.querySelectorAll().

If you read code that uses jQuery you will be reminded of regular expressions. It is the same concept: access a structure by duplicating its identifiers. When the structure changes, because the designer edited the HTML, something doesn't work any more. If you need to fix the resulting bug, you may be confronted with hundreds of jQuery expressions that all point to certain places in the HTML, just one of them fails because it points to something that doesn't exist any more. How find something that doesn't exist any more? Didn't maybe the author write less at the expense of the maintainer that has to do more now?

Get Rid of Boilerplate Code?

"Boilerplate is that part of a form which does not change from one form to another."

Can we agree on this? So then, let's look at some web form: on top is the label "Date of Birth", below is the date-chooser. Do we really want to get rid of the label, which obviously doesn't change from one form to another? How can the user then know that it is "Date of Birth"?

Some boilerplates like the C #ifndef statements are for sure dispensable. But isn't this an ancient language that has been replaced already by things like Java? So why talk about it? Rewrite your software to a modern language!

Doesn't also Java have boilerplate? For example, everything has to be inside a class, but not always we want to express something through a class.
Yes, true. But now it's going to be cultural. I prefer the object-oriented culture, functional languages are not for the masses, structured languages (C, JavaScript) are outdated. Code reuse by inheritance is elegant and efficient, and lacks the boilerplate of delegation, which in fact is code duplication of the public interface of the delegate. If you show me a programming language that expresses the

Object-Oriented Paradigms
  1. Classes: Data and methods that work on them are grouped together in classes

  2. Polymorphism: An interface is an outline of concrete classes that lists some methods they have in common

  3. Abstraction: An abstract class contains default implementations that can declare and call abstract methods implemented by sub-classes only

  4. Inheritance: A class can extend another class, and reuse and modify the public and protected behaviour of the super-class

  5. Encapsulation: Access modifiers like public and private let distinguish between internal implementation and its external interface, thus reducing complexity for the outer world

  6. Dynamic Overrides: At runtime, the method implementation of the most specialized class of an inheritance hierarchy is executed

  7. Method Overloading: You can have equally named but differently implemented methods foo(), foo(bar), foo(bar1, bar2) ... in a class, distinguished by their parameters

  8. Open Recursion: the this keyword provides access to methods and properties of the same object instance

with less boilerplate than Java, I will immediately change to that language. But don't fool me with old mistakes like preprocessors, operator overloading, or other tricks that give unlimited freedom of expression, or poor readability. And: for source code beyond 10000 lines of code there is no way around strict type checks! And: the runtime environment needs to be operating-system independent!

How do you want to get rid of boilerplate when needing to express all of the OO things above? Isn't it important to put everything into a function or class, because a code-block without a name won't be reusable?

Use Comments!

Surprisingly not a single contribution on Jeffrey's Blog mentions the possibility of disarming regular expressions by comments.

The comment, despised by the OO community, can shine here. What about commenting your regular expression? Can it be expressed also in human language? We need to know what it should do, and why it was necessary to use regex.

For jQuery CSS selectors we would like to know what is at the place where the expression points to. Is it the "Submit" button? Or is it the scroll-container of the table? This information is much more valuable than the CSS-class in the selector that may not exist any more.


Conclusion

Take a deep breath. Give your audience the time to think over what you just said. Don't go with those that are bored because they already know about everything. You want to explain it to those that never heard about it. We don't need to be that fast and short. Let's permit the part of boilerplate that is common sense, and reject just the purely technical workarounds.

When coding, you need some structures in which your solutions can live. Keep data and the functions working on them together. Give your solution a class name, or at least a function name. Use anonymous classes and functions just when their content is very specific to the surrounding code, and keep them short by calling properly named functions that are implemented outside.

Language designers compete for the shortest implementation of the quicksort algorithm. It is not important how many lines of code you need for that. It is important that everybody understands, just from reading the code, how quicksort works! Such a programming language will be used in publications of any kind, because it promotes common sense and human understanding.

Why is short not always good? Because what is short for one may become long for many others. And you will have to pay them all. So better do it well right from start.




Keine Kommentare: