fritzthecat-blog: Mai 2015

Mittwoch, 27. Mai 2015

The Immortal AWK Language

It spells AWK because it stands for the names Aho, Weinberger and Kernighan, all AT&T Bell Labs employees in 1977 when awk first appeared.

AWK is an amazing tool. It's a UNIX tool, but even WINDOWS addicts know it. Despite the existence of its successor perl it does not die out. (Maybe perl code is a little hard to read? Maybe the attitude "We let you do it in many ways" is not so useful for source code?-)

AWK is one of the utilities occurring in almost any shell script bigger than 100 lines. Anytime the UNIX shell-script interpreter is coming to its limits, AWK helps out. I would not say that AWK code is easy to read, but not as hard as perl, and even a bit easier than shell scripts. Its syntax feels like a free form of C without pointers (how relieving!).
Warning: it is an interpreted script language and has no explicitly declared data types!

The Shortest Useful is an AWK Program

Let's see how many awk variants are installed. Open a terminal prompt on your LINUX or WINDOWS + CYGWIN system, and enter the following command line (without "$", this is the system's prompt)

$ ls -l /usr/bin/*awk

lrwxrwxrwx 1 root root     21 Jan 17 22:26 /usr/bin/awk -> /etc/alternatives/awk
-rwxr-xr-x 1 root root 538224 Jul  2  2013 /usr/bin/dgawk
-rwxr-xr-x 1 root root 441512 Jul  2  2013 /usr/bin/gawk
-rwxr-xr-x 1 root root   3188 Jul  2  2013 /usr/bin/igawk
-rwxr-xr-x 1 root root 117768 Mar 24  2014 /usr/bin/mawk
lrwxrwxrwx 1 root root     22 Jan 17 22:26 /usr/bin/nawk -> /etc/alternatives/nawk
-rwxr-xr-x 1 root root 445608 Jul  2  2013 /usr/bin/pgawk

Maybe you need to enter ls -la /bin/*awk, this depends on your LINUX variant.
When you enter the ls file-list command with an awk filter pipe, you see this:

$ ls -l /usr/bin/*awk | awk '{print $9, $11}'

/usr/bin/awk /etc/alternatives/awk
/usr/bin/dgawk 
/usr/bin/gawk 
/usr/bin/igawk 
/usr/bin/mawk 
/usr/bin/nawk /etc/alternatives/nawk
/usr/bin/pgawk

We used AWK as column filter, to see only column 9 and 11 (when present). Column 11 represents the target when the file node is a symbolic link.

This is really a short program: {print $9, $11}, don't you think?
What can we learn from that?

AWK reads its input line by line, although you could set the RS (record separator) variable to something else than newline
AWK splits every input record (line) into parts, using the FS (field separator) variable that defaults to whitespace, and makes the parts available as $1 - $N, the whole line is in $0
we need to enclose such an AWK program into 'single quotes', else the shell would see all $-variables and would try to substitute them with most likely no content

That is what all people do with AWK: feed in lines of a file and convert column contents to some new shape.

AWK Program in a File

Put {print $9, $11} into a file named NameAndLink.awk (the extension is not obligatory) ....

NameAndLink.awk

{ print $9, $11 }

.... and then do this:

$ ls -l /usr/bin/*awk | awk -f NameAndLink.awk

# same result as above

Within the file you do not need the 'single quotes' any more. For bigger AWK applications, a separate file for the source code is very recommendable.

Another thing you can do is to tag the file in its head with the according command-interpreter, so that it can be executed as a script:

NameAndLink.awk

#!/usr/bin/awk -f
{ print $9, $11 }

Mind that now you need to set execute-permissions on it:

$ chmod u+x NameAndLink.awk
$ ls -l /usr/bin/*awk | NameAndLink.awk

# same result as above

A Standard AWK Program

Here is a skeleton of how most AWK programs look like.

awk '
  BEGIN {
    print "Starting";
  }
  /a/ {
    print "Got a";
  }
  /b/ || /c/ {
    print "Got b or c, see yourself: " $0
  }
  /b/ {
    print "Got b"
  }
  {
    print "Generally I got " $0;
  }
  END {
    print "Ending"
  }
' <file.txt

Assuming we have a file.txt with content

a
b
c
d

we would see following output:

Starting
Got a
Generally I got a
Got b or c, see yourself: b
Got b
Generally I got b
Got b or c, see yourself: c
Generally I got c
Generally I got d
Ending

What can we learn from that?

The BEGIN section is executed before the first line is read, the END section after the last line has been read
the { brace section } that has no pattern is executed for every line, even when that line was matched against other patterns
the sections headed by /regular expression/ patterns are executed for every line that matches their pattern
when some input matches several patterns, their sections are executed in the order they occur in source code
the trailing ";" semicolon is optional

Similarities to XSLT and CSS are obvious: it is a pattern-matching language.

AWK Capabilities in an Example

Instead of starting "yet another AWK tutorial", I want to demonstrate the power of it in a little application I needed recently. That application should process a Maven pom.xml file and enrich it with version numbers.

Inputs to Process

pom.xml

<?xml version="1.0"?>

<project>
  <groupId>com.mycompany.app</groupId>
  <artifactId>my-module</artifactId>
  <version>1.0.0-SNAPSHOT</version>
 
  <parent>
    <groupId>com.mycompany.app</groupId>
    <artifactId>my-app</artifactId>
    <version>1.0-SNAPSHOT</version>
    <relativePath>../parent/pom.xml</relativePath>
  </parent>

  <dependencies>
  
    <dependency>
      <groupId>fri.example.test</groupId>
      <artifactId>module-one</artifactId>
    </dependency>
  
    <dependency>
      <artifactId>module-two</artifactId>
      <groupId>fri.example.test</groupId>
    </dependency>
  
    <dependency>
      <artifactId>module-hundred</artifactId>
      <groupId>fri.example.test</groupId>
    </dependency>
  
    <dependency>
      <groupId>fri.example.test</groupId>
      <artifactId>module-three</artifactId>
      <exclusions>
        <exclusion>
          <groupId>fri.example.test</groupId>
          <artifactId>module-five</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  
    <dependency>
      <artifactId>module-four</artifactId>
      <groupId>fri.example.test</groupId>
    </dependency>
  
  </dependencies>

</project>

There is a parent referenced, let's assume that it holds a dependencyManagement section where versions for modules are defined, and we processed these into maven-resolve.txt file by calling mvn dependency:resolve (for example by using another awk script;-).

maven-resolve.txt

module-four 1.4
module-two 1.2
module-three 1.3
module-one 1.1

Output to Achieve

We want the associated versions to be put into the module dependency elements like this:

<?xml version="1.0"?>

<project>
  <groupId>com.mycompany.app</groupId>
  <artifactId>my-module</artifactId>
  <version>1.0.0-SNAPSHOT</version>
 
  <parent>
    <groupId>com.mycompany.app</groupId>
    <artifactId>my-app</artifactId>
    <version>1.0-SNAPSHOT</version>
    <relativePath>../parent/pom.xml</relativePath>
  </parent>

  <dependencies>
  
    <dependency>
      <groupId>fri.example.test</groupId>
      <artifactId>module-one</artifactId>
      <version>1.1</version>
    </dependency>
  
    <dependency>
      <artifactId>module-two</artifactId>
      <groupId>fri.example.test</groupId>
      <version>1.2</version>
    </dependency>
  
    <dependency>
      <artifactId>module-hundred</artifactId>
      <groupId>fri.example.test</groupId>
    </dependency>
  
    <dependency>
      <groupId>fri.example.test</groupId>
      <artifactId>module-three</artifactId>
      <exclusions>
        <exclusion>
          <groupId>fri.example.test</groupId>
          <artifactId>module-five</artifactId>
        </exclusion>
      </exclusions>
      <version>1.3</version>
    </dependency>
  
    <dependency>
      <artifactId>module-four</artifactId>
      <groupId>fri.example.test</groupId>
      <version>1.4</version>
    </dependency>
  
  </dependencies>

</project>

As you can see there are some subtleties in the example pom.xml. To irritate the script, there is an exclusion tag containing an artifactId tag. Sometimes the groupId and artifactId tags are swapped. And there is a module-hundred for which we have no version.

The Program

We need to read the maven-resolve.txt file at BEGIN. Then we will read the pom.xml file and input a version tag wherever an according module occurs. The resulting AWK source is amazing short. I wrote it as shell script, to show how shell variables can be integrated into the AWK program.

#!/bin/bash

pom=pom.xml
versions=maven-resolve.txt
newPom=pom-with-versions.xml

awk '
  BEGIN  {    # read versions from file in format "module version" into associative array
    while ((getline < "'$versions'") > 0)  {
      moduleVersions[$1] = $2;
    }
  }
  
  /^[ \t]*<exclusions>/  {    # set a state for artifactId tags appearing here
    inExclusion = 1;
  }
  /^[ \t]*<\/exclusions>/  {    # reset state
    inExclusion = 0;
  }
  
  {
    if (inExclusion != 1)  {
      if (match($1, "^<artifactId>(.+)</artifactId>$", matchArray))  {
        currentVersion = moduleVersions[matchArray[1]];
      }
      else if (length(currentVersion) > 0 && $1 == "</dependency>")  {    # having version for artifact, print it
        print "\t\t\t<version>" currentVersion "</version>";
        currentVersion = "";
      }
    }
    
    print $0;    # print out any line of POM
  }
' < $pom > $newPom

First we read maven-resolve.info line by line using the built-in getline command. We can use an input redirection with the getline command like in a shell script. Mind that the input redirection file name needs to be wrapped into "double quotes" inside the AWK program. But the shell variable needs to be outside the AWK program, so it is enclosed in single quotes, which 'splits' the AWK program and exposes $versions to the shell for substitution.

When reading the file, AWK works as usual by splitting the line and provding $1 - $N. We use this to fill up the "associative array" moduleVersions with our module/version informations. In AWK you don't need to declare variables, you simply use them. AWK will create them when needed. They're all global.

So after this we have a map of module/version associations. Now we process every line of pom.xml. Because we need the module name from within the artifactId tags, I decided to match them in the { common braces }. The built-in match() function can give us the the text within the artifactId tags, because I enclosed that into ( parentheses ). That enclosed text will appear in the third parameter matchArray. Mind that AWK has array indexes from 1-n, not from 0-n, so I get the name of the module from matchArray[1]. And I query the map with that module name to get its version.

When then a line appears that contains a closing dependency tag, the script checks whether a version exists for that passed artifactId, and inserts a version tag if so. Finally every line of the pom.xml gets printed out unchanged by print $0.

The exclusions patterns are there because within a Maven exclusion also artifactId tags can appear. And these exclusions are within dependency elements. So such would break the artifact version found just before, and thus the script uses the inExclusion state to avoid this. Without that state, module-three would miss its version.

Please mind that awk does not understand XML, so e.g. an XML comment at the wrong place could break the script. This is is just a quick line-reader solution!

That's it, AWK contains a lot more, so use it and enjoy the brevity. I've also seen bigger applications written in AWK, but like all script languages it lacks encapsulation, and thus is not suitable for big modular projects.

Freitag, 15. Mai 2015

JS Document Treeification

Yet another Blog about information visualization. The JavaScript working in this page extends the script I wrote for generating table-of-contents, see my according Blog article.

The idea is to not have a separate table-of-contents but to have the document within the TOC, or, in other words, to have a document that is a TOC. This is possible only with expand-controls as used in an UI tree, else the document could not mimic a TOC. Have a look at the treeification example below (it was not only Daphne, there were also Philemon and Baucis :-).

Treeification Example

▷
One

▷
Two

▷
Three

▷
Four

You see just the top-level headings. You can choose which one you are interested in and expand it. When you decide to go into breadth first, you could expand another top-level heading. Or you go into detail and expand the first one further.

The contrary approach is to have the document initially fully expanded. You can skip chapters you are not interested in by collapsing them.

Treeification Example

▽
One

This is chapter 1.

▽

One - One

This is chapter 1.1.

▽

One - One - One

This is chapter 1.1.1

▽

One - One - Two

This is chapter 1.1.2

▽
Two

This is chapter 2

▽

Two - One

This is chapter 2.1

▽

Two - One - One

This is chapter 2.1.1

▽

Two - One - One - One

This is chapter 2.1.1.1

▽

Two - One - One - Two

This is chapter 2.1.1.2

▽

Two - One - One - Three

This is chapter 2.1.1.3

▽

Two - One - Two

This is chapter 2.1.2

▽

Two - Two

This is chapter 2.2

▽

Two - Three

This is chapter 2.3

▽

Two - Three - One

This is chapter 2.3.1

▽

Two - Three - Two

This is chapter 2.3.2

▽

Two - Three - Two - One

This is chapter 2.3.2.1

▽
Three

This is chapter 3

▽

Three - One

This is chapter 3.1

▽
Four

This is chapter 4

A further idea would be to have just the headers visible, not the chapter text. Wherever there is chapter text, an expand control would be there to make it visible. Haven't implemented this yet.

What is missing for sure are buttons to (1) fully expand and (2) fully collapse the document.

The remainder of this Blog will be about how this was implemented in JS.
And I will layout that as treeification, so please click on the left-side arrows to see text.

▷
1 Use Old

The essence of writing is rewriting. This also applies to software!
So the first thing to do was rewrite the TOC-generator to have overridable functions that can be used to change its behaviour. In this case I introduced a new function customizeHeading(level, heading, titleElement) that gets called for every chapter heading found. Remember that the title element could be different from the heading element, e.g. the heading is the parent div, and the title is some nested h3 element. To the title element the expand control must be attached, and the heading element determines the content to be folded by the expand control.

Another adaption was to split the too big parameter list of tableOfContents() into parameters of statical and dynamical character. The staticals went up to the enclosing factory function, while the dynamicals stayed on tableOfContents(), which is now tableOfContents(topElement, headingTags, tocContainer). And I added a statical parameter initiallyFolded, which makes sense only for treeification, not for TOC-generation.

I also introduced a new function globalSettings() that initially receives all defaults and settings used during the document processing. Extensions thus have the chance to know a little about what happens in the TOC generator.

    var that = {};
    
    /**
     * Factory producing content-tree creator instances.
     * @param treeIndentInEm optional, for nested elements, the amount of tree-indentation in em,
     *     defaults to undefined.
     * @param doTocNumbering optional, chapter numbers are prepended to TOC when true,
     *     defaults to true.
     * @param doHeadingNumbering optional, chapter numbers are prepended to headings when true,
     *     defaults to true.
     * @param chapterNumberSeparator optional, separator for chapter numbers,
     *     defaults to ".".
     * @param doTrailingSeparator optional, when true chapter numbers will be
     *     "1.2." instead of "1.2", defaults to false.
     * @param initiallyFolded optional, whether chapters should be initially collapsed,
     *     defaults to false.
     * @return an instance that can treeify a document.
     */
    that.create = function(
              treeIndentInEm,
              doTocNumbering,
              doHeadingNumbering,
              chapterNumberSeparator,
              doTrailingSeparator,
              initiallyFolded)
    {
      var foldingUtil = folding.create();
      
      var tocUtil = tocCreator.create(
              treeIndentInEm,
              doTocNumbering,
              doHeadingNumbering,
              chapterNumberSeparator,
              doTrailingSeparator);
      
      var superTableOfContents = tocUtil.tableOfContents;
      
      /** Overwritten to install expand controls into the document tree before terminating. */
      tocUtil.tableOfContents = function(topElement, headingTags, tocContainer) {
        var toReturn = superTableOfContents(topElement, headingTags, tocContainer);
        installExpandControls();
        return toReturn;
      }

      .....

    };
    
    return that;

The create() function is the factory that produces instances you can call tableOfContents() upon. It receives the statical parameters mentioned above. Inside there are two objects that are given to this snippet globally (e.g. through AMD factory arguments). First is folding factory that we need to install expand controls. Second is tocCreator which creates the TOC generator instance we will extend here. Remember that in JS you always need an instance to perform inheritance. The private variable tocUtil receives that instance, and further code uses that instance to extend it with overrides and new functionality. Finally that instance is returned for calling the modified tableOfContents() on it. (That function name does not meet the purpose any more, a convenience function should be introduced that delegates to it.)

The override of tableOfContents() first calls superTableOfContents to execute the document processing. Then, after having received tree information through overrides, it installs expand controls by using the foldingUtil and some new functions that find the content to be folded.

Some functions should not work like they do when treeifying the document. First there should not be a TOC, because the document itself will be like a TOC. Then we do not need hyperlinks from the TOC to the chapter heading. Finally we want to receive tree information from the document traversal, and we override the new function customizeHeading(level, heading, titleElement) for that. Additionally we receive the ground-tree-level and the fact whether structure analysis is done by h1 - h6 elements in globalSettings().

      var tree = [];

      var levelByH1To6;
      var groundLevel;
            
      /** Overwritten to store global settings. */
      tocUtil.globalSettings = function(headingsParam, topElementParam, groundLevelParam, headingTagsParam, levelByH1To6Param) {
        levelByH1To6 = levelByH1To6Param;
        groundLevel = groundLevelParam;
      };
      
      /** Overwritten to NOT insert TOC into document. */
      tocUtil.createDefaultTocContainer = function() {
        return document.createElement("div");
      };

      /** Overwritten to do nothing, nothing gets linked here. */
      tocUtil.setHyperlinkIdToHeading = function() {
        return undefined;
      };

      /** Overwritten to create just an LI item for chapter number counting. */
      tocUtil.createTocItem = function() {
        return tocUtil.createTocListItem();
      };
      
      /** Overwritten to build a tree from super's iterations through heading elements. */
      tocUtil.customizeHeading = function(level, heading, titleElement) {
        tree.push({
          level: level,
          heading: heading,
          titleElement: titleElement
        });
      };

For storing tree information we hold the private instance-bound array tree. Each tree node, ordered depth-first, will be in there as an object with heading, title and tree-level. After super's document iteration we will use this array to install folding, and it must stay untouched during the whole page lifetime, because it will be used any time an expand control is clicked to retrieve child elements to be shown or hidden.

▷
2 To Create New

      var installExpandControls = function() {
        for (var i = 0; i < tree.length; i++) {
          var t = tree[i];
          installExpandControl(t.level, t.heading, t.titleElement);
        }
      };

Unfortunately it is quite complex to find the correct elements to show and hide on expand-control clicks. When you read my Blog about treetables you might remember that on "collapse" we make invisible all elements below the control, but on "expand" we must not make visible elements that are managed by other expand controls. Else the expansion state of child items will be lost.
To make it even harder we must distinguish between the chapter hierarchy and the HTML element hierarchy, which may be identical, but could be different. In case of h1 - h6 structuring a structural child might be a HTML sibling. And that kind of structuring is more popular than nesting div or section elements into each other.

Following function shows how the tree array is used to find a list of structural children, whereby that list contains children of any depth, meaning the whole sub-tree of the given heading, recursively.

      var structuralChildren = function(heading) {
        var structuralChildren = [];
        var level;
        
        for (var i = 0; i < tree.length; i++) {
          var h = tree[i].heading;
          var l = tree[i].level;
          
          if (level !== undefined) {
            if (l > level)
              structuralChildren.push(tree[i]);
            else if (l <= level)
              return structuralChildren;
          }
          else if (h === heading) {
            level = l;
          }
        }
        return structuralChildren;
      };

      var nextStructuralSibling = function(heading) {
        var level;
        for (var i = 0; i < tree.length; i++) {
          var h = tree[i].heading;
          var l = tree[i].level;
          if (level !== undefined && l <= level)
            return h;
          else if (h === heading)
            level = l;
        }
        return undefined;
      };

      var directChildrenWithout = function(titleElement, heading) {
        var directChildren = [];
        for (var i = 0; i < heading.children.length; i++) {
          var child = heading.children[i];
          if ( ! domUtil.isElementOrParentOf(child, titleElement, heading) )
            directChildren.push(child);
        }
        return directChildren;
      };

      var siblingsUntil = function(heading, stopElement) {
        var siblings = [];
        var inBranch = false;
        var parent = heading.parentNode;
        for (var i = 0; i < parent.children.length && parent.children[i] !== stopElement; i++) {
          var sibling = parent.children[i];
          if (inBranch)
            siblings.push(sibling);
          else if (sibling === heading)
            inBranch = true;
        }
        return siblings;
      };

      var installExpandControl = function(level, heading, titleElement) {
        var elementsToHide = getElementsToHide(heading, titleElement);
        var childHeadings = structuralChildren(heading);
        
        var content = function(displayed) {
          if (displayed)
            return getElementsToShow(elementsToHide, childHeadings);
          else
            return elementsToHide;
        };

        foldingUtil.connect(
                titleElement,
                content,
                initiallyFolded === undefined ? false : initiallyFolded,
                false);
        
        if ( ! levelByH1To6 && treeIndentInEm !== undefined)
          heading.style["padding-left"] = ""+((level - groundLevel) * treeIndentInEm)+"em";
      };

This installs the expand controls, and indents the heading when possible and required. How folding works is beyond the scope of this article. In short, it prepends an arrow to the given titleElement, and connects the click callback of that arrow to a function that sets given content visible or invisible. These content elements can be given as single element, as array of elements, or as a function that returns element(s). Here a function is used, that in the case of "setting displayed" returns what getElementsToShow(elementsToHide, childHeadings) delivers, and in case of "setting not displayed" returns the array of elements to hide.

As a precondition for that, the function reads the list of elements to hide from getElementsToHide(heading, titleElement). Parts of that list will also be used in case of "setting displayed".

      var getElementsToHide = function(heading, titleElement) {
        var directChildren = directChildrenWithout(titleElement, heading);
        var stopElement = nextStructuralSibling(heading);
        var siblings = siblingsUntil(heading, stopElement);
        
        var elementsToHide = [];
        for (var i = 0; i < directChildren.length; i++)
          elementsToHide.push(directChildren[i]);
        for (var i = 0; i < siblings.length; i++)
          elementsToHide.push(siblings[i]);
        
        return elementsToHide;
      };

      var getElementsToShow = function(elementsToHide, childHeadings) {
        var collapsedChildHeadings = [];
        for (var i = 0; i < childHeadings.length; i++) {
          var childHeading = childHeadings[i];
          if ( ! foldingUtil.isExpanded(childHeading.titleElement) )
            collapsedChildHeadings.push(childHeading);
        }
        
        var elementsToShow = [];
        for (var j = 0; j < elementsToHide.length; j++) {
          var elementToHide = elementsToHide[j];
          var isBelowCollapsed = false;
          
          for (var k = 0; k < collapsedChildHeadings.length && isBelowCollapsed === false; k++) {
            var collapsedHeading = collapsedChildHeadings[k];
            var elementsBelowCollapsed = getElementsToHide(collapsedHeading.heading, collapsedHeading.titleElement);
            if (contains(elementsBelowCollapsed, elementToHide))
              isBelowCollapsed = true;
          }
          
          if (isBelowCollapsed === false)
            elementsToShow.push(elementToHide);
        }
        
        return elementsToShow;
      };

      var contains = function(array, element) {
        for (var i = 0; i < array.length; i++)
          if (array[i] === element)
            return true;
        return false;
      };

The parameter elementsToHide contains all elements that would be hidden when closing the control. This is filtered by means of the second parameter that contains all child headings (recursively) that also can expand or collapse content.

First those child headings are retrieved that are currently collapsed. Then the new array elementsToShow is created. It receives all elements from elementsToHide that are not below any of the collapsed child headings.

What has not been shown here are the folding and domUtil modules. You can find these in the folded full source on bottom of this page.

<script type="text/javascript">
  "use strict";

  var treeCreatorInstance = treeCreator.create(
      1,          /* indentation */
      undefined,  /* whether TOC items should have numbering, true/false */
      false,      /* whether chapter headings should have numbering, true/false */
      undefined,  /* chapter number separator different from "." */
      undefined,  /* whether chapter numbers should have a trailing separator, true/false */
      true        /* whether document tree should be initially collapsed, true/false */
  );
  var blogDiv = document.getElementById("blog");
  treeCreatorInstance.tableOfContents(blogDiv);
  
</script>

There are a lot of new ideas now to continue this work. One thing is making the initiallyFolded parameter more flexible. It should be configurable for any chapter individually whether it is collapsed or not.

▷

Click here to see full source code.

  "use strict";


  var tocCreator = function()
  {
    var that = {};
    
    /**
     * Factory producing TOC creator instances.
     * @param treeIndentInEm optional, for nested elements, the amount of tree-indentation in em,
     *     defaults to 1.
     * @param doTocNumbering optional, chapter numbers are prepended to TOC when true,
     *     defaults to true.
     * @param doHeadingNumbering optional, chapter numbers are prepended to headings when true,
     *     defaults to true.
     * @param chapterNumberSeparator optional, separator for chapter numbers,
     *     defaults to ".".
     * @param doTrailingSeparator optional, when true chapter numbers will be
     *     "1.2." instead of "1.2", defaults to false.
     * @return an instance that can create dynamic table-of-contents.
     */
    that.create = function(
              treeIndentInEm,
              doTocNumbering,
              doHeadingNumbering,
              chapterNumberSeparator,
              doTrailingSeparator)
    {
      
      var tocUtil = {};
    
      /**
       * Reads headings, numbers them, extracts them to a linked table-of-contents
       * and inserts that TOC below a single h1 heading, or as first child of given top-element.
       * @param topElement optional, the HTML element to search for headings,
       *     defaults to document body.
       * @param headingTags optional, the CSS selector to use for search of structuring elements,
       *     defaults to "h2, h3, h4, h5, h6".
       * @param tocContainer optional, the container where to insert the created TOC,
       *     defaults to topElement below a single H1, or on top.
       * @return the given or created container of the table-of-contents.
       */
      tocUtil.tableOfContents = function(topElement, headingTags, tocContainer) {
        topElement = topElement || document.body;
        headingTags = headingTags || guessHeadingTags(topElement);
        doTocNumbering = (doTocNumbering === undefined) ? true : doTocNumbering;
        doHeadingNumbering = (doHeadingNumbering === undefined) ? true : doHeadingNumbering;
        chapterNumberSeparator = chapterNumberSeparator || ".";
        doTrailingSeparator = (doTrailingSeparator === undefined) ? false : doTrailingSeparator;

        var headings = tocUtil.findHeadingsInOrder(topElement, headingTags);
        if ( ! headings || ! headings.length )
          return;

        tocContainer = tocContainer || tocUtil.createDefaultTocContainer(topElement);

        var levelByH1To6 = /h[1-6]/i.test(headingTags);
        var groundLevel= tocUtil.getLevel(headings[0], topElement, headingTags, levelByH1To6);
        tocUtil.globalSettings(headings, topElement, groundLevel, headingTags, levelByH1To6);

        var stack = [];
        stack.push({
          level: groundLevel,
          chapterList: appendTocList(tocContainer)
        });

        for (var i = 0; i < headings.length; i++) {
          var heading = headings[i];
          var currentList = peek(stack);
          var currentLevel = currentList.level;
          var headingLevel = tocUtil.getLevel(heading, topElement, headingTags, levelByH1To6);

          if (headingLevel === currentLevel + 1) {
            stack.push({
              level: headingLevel,
              chapterList: appendTocList(getLastListElement(currentList.chapterList))
            });
          }
          else if (headingLevel < currentLevel) {
            for (; headingLevel < currentLevel; currentLevel--)
              stack.pop();
          }
          else if (headingLevel > currentLevel + 1) {
            throw "Inconsistent document structure, incrementing level from "+currentLevel+" to "+headingLevel;
          }

          appendTocItem(
                  stack,
                  levelByH1To6,
                  heading,
                  doTocNumbering,
                  doHeadingNumbering,
                  chapterNumberSeparator,
                  doTrailingSeparator);
        }

        return tocContainer;
      };

      /**
       * Publish global settings (that won't change) to sub-objects.
       * This implementation does nothing, it is to be overridden.
       */
      tocUtil.globalSettings = function(headings, topElement, groundLevel, headingTags, levelByH1To6) {
      };
      
      var guessHeadingTags = function(topElement) {
        var defaultHeadingTags = "h2, h3, h4, h5, h6";
        var defaultHeadings = tocUtil.findHeadingsInOrder(topElement, defaultHeadingTags);
        var firstTagName;
        for (var i = 0; i < defaultHeadings.length; i++) {
          if ( ! firstTagName )
            firstTagName = defaultHeadings[i].tagName;
          else
            if (firstTagName !== defaultHeadings[i].tagName)
              return defaultHeadingTags; /* found at least two different heading tags */
        }

        if (hasMoreThanOne(topElement, "div"))
          return "div";
        if (hasMoreThanOne(topElement, "section"))
          return "section";
        if (hasMoreThanOne(topElement, "article"))
          return "article";

        return defaultHeadingTags;
      };

      var hasMoreThanOne = function(topElement, cssSelector) {
        var headings = tocUtil.findHeadingsInOrder(topElement, cssSelector);
        return headings && headings.length > 1;
      };

      /**
       * Reads all headings to extract and number, in document order:
       * querySelectorAll() does it, see
       * http://www.w3.org/TR/selectors-api/#queryselectorall.
       * @param topElement the HTML element to search for headings.
       * @param headingTags the CSS selector to use for search.
       * @return something that has a length and can be iterated with a for loop.
       */
      tocUtil.findHeadingsInOrder = function(topElement, headingTags) {
        return topElement.querySelectorAll(headingTags);
      };

      /**
       * Searches for a h1, places TOC as div below it when found, else
       * places TOC as first child of given HTML element or body.
       * @return what is to be used as container for the table of contents.
       */
      tocUtil.createDefaultTocContainer = function(topElement) {
        var toc = document.createElement("div");
        var headings1 = topElement.getElementsByTagName("h1");
        var heading1 = (headings1 && headings1.length === 1) ? headings1[0] : undefined;
        if (heading1)
          heading1.parentElement.insertBefore(toc, heading1.nextSibling);
        else
          topElement.insertBefore(toc, topElement.children[0]);
        return toc;
      };

      tocUtil.getLevel = function(heading, topElement, headingTags, levelByH1To6) {
        return levelByH1To6 ?
            tocUtil.getHeadingLevel(heading) :
            tocUtil.getNestingLevel(heading, topElement, headingTags);
      };

      /** @return the given heading's number, e.g. "3" from "h3". */
      tocUtil.getHeadingLevel = function(heading) {
        return parseInt(heading.tagName.substring(1));
      };

      /** @return the 0-n level of given element, relative to given topElement. */
      tocUtil.getNestingLevel = function(element, topElement, nestingTags) {
        var parent = element.parentNode;
        var i = 0;
        while (parent !== topElement) {
          if (parent.tagName && new RegExp(parent.tagName, "i").test(nestingTags))
            i++; /* when tag is in nesting-tags, count this as level */
          parent = parent.parentNode;
        }
        return i;
      };

      var peek = function(stack) {
        return stack[stack.length - 1];
      };

      var getLastListElement = function(list) {
        return list.children[list.children.length - 1];
      };

      /**
       * This implementation returns the input number unchanged.
       * @return a chapter number converted to arbitrary numbering system (Roman, letter, ...).
       */
      tocUtil.convertChapterNumber = function(number) {
        return number;
      };

      var createNextChapterNumber = function(stack, chapterNumberSeparator) {
        var number = "";
        for (var i = 0; i < stack.length; i++) {
          var currentList = stack[i];
          var currentNumber = currentList.chapterList.children.length;
          if (i === stack.length - 1) /* end of nestings, at rightmost chapter number */
            currentNumber++; /* create NEXT number */
          currentNumber = tocUtil.convertChapterNumber(currentNumber);
          number = number+(number === "" ? "" : chapterNumberSeparator)+currentNumber;
        }
        return number;
      };

      /**
       * This implementation creates an UL element without list-style-type (no bullets).
       * @return a list that will hold a chapter headings.
       */
      tocUtil.createTocList = function() {
        var list = document.createElement("ul");
        list.style.cssText = 
                "list-style-type: none; "+
                (treeIndentInEm !== undefined ? "padding-left: "+treeIndentInEm+"em;" : "");
        return list;
      };

      var appendTocList = function(parent) {
        var list = tocUtil.createTocList();
        parent.appendChild(list);
        return list;
      };

      /**
       * This implementation creates a LI element.
       * @return an TOC item element that will receive a chapter heading link.
       */
      tocUtil.createTocListItem = function() {
        return document.createElement("li");
      };

      /**
       * This implementation creates and links an A element.
       * @return a TOC link element that will receive a chapter heading text.
       */
      tocUtil.createHyperlink = function(headingHtml, headingId) {
        var hyperLink = document.createElement("a");
        hyperLink.innerHTML = headingHtml;
        hyperLink.setAttribute("href", "#"+headingId);
        return hyperLink;
      };

      var getTextContent = function(nestingElement) {
        var title = (nestingElement.childNodes[0] && nestingElement.childNodes[0].nodeName === "#text") ?
            nestingElement.childNodes[0].textContent :
            undefined;
        return (title && (title = title.trim())) ? title : undefined;
      };

      /**
       * @return this is called when the document is structured by
       *    nested elements, and it is expected to return the heading-element
       *    for given nested container, e.g. the heading in a section.
       *    This implementation returns the first child element that
       *    is not a single child.
       */
      tocUtil.getTitleElementFromNesting = function(nestedElement) {
        while (nestedElement.children.length === 1)
          nestedElement = nestedElement.children[0]; /* in case there are wrappers */
        
        var titleElement = nestedElement.children[0] ? nestedElement.children[0] : nestedElement;
        while (titleElement.children.length === 1)
          titleElement = titleElement.children[0]; /* in case there are wrappers */
        
        return titleElement;
      };

      tocUtil.setHyperlinkIdToHeading = function(nextChapterNumber, titleElement) {
        var headingId = "h"+nextChapterNumber.replace(/\./g, '_');
        titleElement.setAttribute("id", headingId);
        return headingId;
      };

      tocUtil.prependToElement = function(levelByH1To6, toPrepend, titleText, titleElement) {
        if (levelByH1To6) /* assuming H[1-6] has no children */
          titleElement.innerHTML = toPrepend+"&nbsp;"+titleText;
        else
          titleElement.insertBefore(document.createTextNode(toPrepend+" "), titleElement.childNodes[0]);
      };

      tocUtil.createTocItem = function(
              doTocNumbering,
              nextChapterNumber,
              headingText,
              headingId)
      {
        var tocItem = tocUtil.createTocListItem();
        if (doTocNumbering) {
          var chapterNumber = document.createElement("span");
          chapterNumber.innerHTML = nextChapterNumber+"&nbsp;";
          tocItem.appendChild(chapterNumber);
        }

        var hyperLink = tocUtil.createHyperlink(headingText, headingId);
        tocItem.appendChild(hyperLink);

        return tocItem;
      };

      var appendTocItem = function(
              stack,
              levelByH1To6,
              heading,
              doTocNumbering,
              doHeadingNumbering,
              chapterNumberSeparator,
              doTrailingSeparator)
      {
        var nextChapterNumber = createNextChapterNumber(stack, chapterNumberSeparator);
        if (doTrailingSeparator)
          nextChapterNumber = nextChapterNumber + chapterNumberSeparator;

        var headingText = levelByH1To6 ? heading.innerHTML : getTextContent(heading);
        var titleElement = (levelByH1To6 || headingText) ? heading : tocUtil.getTitleElementFromNesting(heading);
        if ( ! headingText )
          headingText = titleElement.innerHTML;

        if (doHeadingNumbering)
          tocUtil.prependToElement(levelByH1To6, nextChapterNumber, headingText, titleElement);

        var currentList = peek(stack);
        tocUtil.customizeHeading(currentList.level, heading, titleElement);

        var headingId = tocUtil.setHyperlinkIdToHeading(nextChapterNumber, titleElement);
        var tocItem = tocUtil.createTocItem(
                doTocNumbering,
                nextChapterNumber,
                headingText,
                headingId);

        currentList.chapterList.appendChild(tocItem);
      };

      /*
       * Called once for each chapter heading. Does nothing. To be overridden.
       * @param level the tree level of heading.
       * @param heading the current heading/nesting element to customize.
       * @param titleElement the element holding the chapter title, mostly identical with heading.
       */
      tocUtil.customizeHeading = function(level, heading, titleElement) {
      };


      return tocUtil;
    };
    
    
    return that;
    
  }();
  
  
  
  
  var domUtil = function()
  {
    var setDisplayed = function(element, displayed) {
      if (displayed === true && ! isDisplayed(element)) {
        var displayStyle = element.oldDisplayStyle ? element.oldDisplayStyle : "";
        element.style.display = displayStyle;
      }
      else if ( ! displayed && isDisplayed(element)) {
        element.oldDisplayStyle = element.style.display;
        element.style.display = "none";
      }
    };

    var isDisplayed = function(element) {
      return element.style.display !== "none";
    };

    var isElementOrParentOf = function(toSearch, startElement, topMostElement) {
      var parent = startElement;
      while (parent && parent !== topMostElement) {
        if (parent === toSearch)
          return true;
        parent = parent.parentNode;
      }
      return false;
    };
    
    
    return {
      isDisplayed: isDisplayed,
      setDisplayed: setDisplayed,
      isElementOrParentOf: isElementOrParentOf,
    };

  }();
  
  
  
  
  var folding = function()
  { 
    var that = {};

    that.blackTriangleRight = "\u25B6";
    that.blackTriangleDown = "\u25BC";
    that.whiteTriangleRight = "\u25B7";
    that.whiteTriangleDown = "\u25BD";
    that.plus = "+";
    that.minus = "\u2012";
    that.blackTriangleRightSmall = "\u25B8";
    that.blackTriangleDownSmall = "\u25BE";
    that.whiteTriangleRightSmall = "\u25B9";
    that.whiteTriangleDownSmall = "\u25BF";

    that.create = function(expandSymbol, collapseSymbol, styling)
    {
      var foldingUtil = {};

      var symbolExpand = expandSymbol || that.whiteTriangleRight;
      var symbolCollapse = collapseSymbol || that.whiteTriangleDown;

      foldingUtil.connect = function(titleElement, content, initiallyFolded, canClickWholeTitle) {
        var expandControl = prependExpandControlContainer(titleElement);
        expandControl.innerHTML = (initiallyFolded ? symbolExpand : symbolCollapse);

        var notifier = canClickWholeTitle ? titleElement : expandControl;
        notifier.addEventListener("click", function() {
          clickListener(expandControl, content);
        });

        if (canClickWholeTitle)
          notifier.style.cursor = "pointer";

        if (initiallyFolded)
          setDisplayed(content, false);

        return notifier;
      };

      foldingUtil.isExpanded = function(titleElement) {
        var expandControl = getExpandControl(titleElement);
        return expandControl ? (expandControl.innerHTML === symbolCollapse) : true;
      };


      var prependExpandControlContainer = function(titleElement) {
        var expandControl = getExpandControl(titleElement);

        if ( ! expandControl ) {
          expandControl = document.createElement("div");
          expandControl.style.display = "inline-block";
          expandControl.style.width = "1em";
          expandControl.style.height = "1em";
          expandControl.style.cursor = "pointer";
          expandControl.style.cssText += "; text-align: center; margin-right: 0.5em;";

          if (styling)
             styling(expandControl);

          titleElement.insertBefore(expandControl, titleElement.childNodes[0]);
        }

        return expandControl;
      };

      var getExpandControl = function(titleElement) {
        var expandControl = titleElement.children[0];
        if (expandControl) {
          var titleString = expandControl.innerHTML;
          if (titleString !== symbolCollapse && titleString !== symbolExpand)
            expandControl = undefined;
        }
        return expandControl;
      };

      var clickListener = function(expandControl, content) {
        var displayed = isDisplayed(content);
        setDisplayed(content, displayed === false);
        expandControl.innerHTML = displayed ? symbolExpand : symbolCollapse;
      };

      var setDisplayed = function(content, displayed) {
        if (content.style)
          domUtil.setDisplayed(content, displayed);
        else if (content instanceof Array)
          for (var i = 0; i < content.length; i++)
            setDisplayed(content[i], displayed);
        else
          setDisplayed(content(displayed), displayed);
      };

      var isDisplayed = function(content) {
        if (content.style)
          return domUtil.isDisplayed(content);
        else if (content instanceof Array)
          return content.length > 0 && isDisplayed(content[0]);
        else
          return isDisplayed(content(true));
      };


      return foldingUtil;
      
    };


    return that;

  }();




  var treeCreator = function()
  {
    var that = {};
    
    /**
     * Factory producing content-tree creator instances.
     * @param treeIndentInEm optional, for nested elements, the amount of tree-indentation in em,
     *     defaults to undefined.
     * @param doTocNumbering optional, chapter numbers are prepended to TOC when true,
     *     defaults to true.
     * @param doHeadingNumbering optional, chapter numbers are prepended to headings when true,
     *     defaults to true.
     * @param chapterNumberSeparator optional, separator for chapter numbers,
     *     defaults to ".".
     * @param doTrailingSeparator optional, when true chapter numbers will be
     *     "1.2." instead of "1.2", defaults to false.
     * @param initiallyFolded optional, whether chapters should be initially collapsed,
     *     defaults to false.
     * @return an instance that can treeify a document.
     */
    that.create = function(
              treeIndentInEm,
              doTocNumbering,
              doHeadingNumbering,
              chapterNumberSeparator,
              doTrailingSeparator,
              initiallyFolded)
    {
      var foldingUtil = folding.create();
      
      var tocUtil = tocCreator.create(
              treeIndentInEm,
              doTocNumbering,
              doHeadingNumbering,
              chapterNumberSeparator,
              doTrailingSeparator);
      
      var superTableOfContents = tocUtil.tableOfContents;
      
      var tree = [];

      var levelByH1To6;
      var groundLevel;
      
      /** Overwritten to install expand controls into the document tree before terminating. */
      tocUtil.tableOfContents = function(topElement, headingTags, tocContainer) {
        var toReturn = superTableOfContents(topElement, headingTags, tocContainer);
        installExpandControls();
        return toReturn;
      }
      
      /** Overwritten to store global settings. */
      tocUtil.globalSettings = function(headingsParam, topElementParam, groundLevelParam, headingTagsParam, levelByH1To6Param) {
        levelByH1To6 = levelByH1To6Param;
        groundLevel = groundLevelParam;
      };
      
      /** Overwritten to NOT insert TOC into document. */
      tocUtil.createDefaultTocContainer = function() {
        return document.createElement("div");
      };

      /** Overwritten to do nothing, nothing gets linked here. */
      tocUtil.setHyperlinkIdToHeading = function() {
        return undefined;
      };

      /** Overwritten to create just an LI item for chapter number counting. */
      tocUtil.createTocItem = function() {
        return tocUtil.createTocListItem();
      };
      
      /** Overwritten to build a tree from super's iterations through heading elements. */
      tocUtil.customizeHeading = function(level, heading, titleElement) {
        tree.push({
          level: level,
          heading: heading,
          titleElement: titleElement
        });
      };


      var directChildrenWithout = function(titleElement, heading) {
        var directChildren = [];
        for (var i = 0; i < heading.children.length; i++) {
          var child = heading.children[i];
          if ( ! domUtil.isElementOrParentOf(child, titleElement, heading) )
            directChildren.push(child);
        }
        return directChildren;
      };

      var siblingsUntil = function(heading, stopElement) {
        var siblings = [];
        var inBranch = false;
        var parent = heading.parentNode;
        for (var i = 0; i < parent.children.length && parent.children[i] !== stopElement; i++) {
          var sibling = parent.children[i];
          if (inBranch)
            siblings.push(sibling);
          else if (sibling === heading)
            inBranch = true;
        }
        return siblings;
      };

      var nextStructuralSibling = function(heading) {
        var level;
        for (var i = 0; i < tree.length; i++) {
          var h = tree[i].heading;
          var l = tree[i].level;
          if (level !== undefined && l <= level)
            return h;
          else if (h === heading)
            level = l;
        }
        return undefined;
      };

      var structuralChildren = function(heading) {
        var structuralChildren = [];
        var level;
        
        for (var i = 0; i < tree.length; i++) {
          var h = tree[i].heading;
          var l = tree[i].level;
          
          if (level !== undefined) {
            if (l > level)
              structuralChildren.push(tree[i]);
            else if (l <= level)
              return structuralChildren;
          }
          else if (h === heading) {
            level = l;
          }
        }
        return structuralChildren;
      };

      var getElementsToHide = function(heading, titleElement) {
        var directChildren = directChildrenWithout(titleElement, heading);
        var stopElement = nextStructuralSibling(heading);
        var siblings = siblingsUntil(heading, stopElement);
        
        var elementsToHide = [];
        for (var i = 0; i < directChildren.length; i++)
          elementsToHide.push(directChildren[i]);
        for (var i = 0; i < siblings.length; i++)
          elementsToHide.push(siblings[i]);
        
        return elementsToHide;
      };
      
      var getElementsToShow = function(elementsToHide, childHeadings) {
        var collapsedChildHeadings = [];
        for (var i = 0; i < childHeadings.length; i++) {
          var childHeading = childHeadings[i];
          if ( ! foldingUtil.isExpanded(childHeading.titleElement) )
            collapsedChildHeadings.push(childHeading);
        }
        
        var elementsToShow = [];
        for (var j = 0; j < elementsToHide.length; j++) {
          var elementToHide = elementsToHide[j];
          var isBelowCollapsed = false;
          
          for (var k = 0; k < collapsedChildHeadings.length && isBelowCollapsed === false; k++) {
            var collapsedHeading = collapsedChildHeadings[k];
            var elementsBelowCollapsed = getElementsToHide(collapsedHeading.heading, collapsedHeading.titleElement);
            if (contains(elementsBelowCollapsed, elementToHide))
              isBelowCollapsed = true;
          }
          
          if (isBelowCollapsed === false)
            elementsToShow.push(elementToHide);
        }
        
        return elementsToShow;
      };

      var contains = function(array, element) {
        for (var i = 0; i < array.length; i++)
          if (array[i] === element)
            return true;
        return false;
      };
      
      
      var installExpandControls = function() {
        for (var i = 0; i < tree.length; i++) {
          var t = tree[i];
          installExpandControl(t.level, t.heading, t.titleElement);
        }
      };
      
      /**
       * Installs expand controls on document headings.
       * An expand control toggles the visibility of direct
       * children of the given heading element (excluding the titleElement
       * holding the expand control, and none of that title's parents),
       * and all its siblings down to next heading element of same
       * or smaller level.
       */
      var installExpandControl = function(level, heading, titleElement) {
        var elementsToHide = getElementsToHide(heading, titleElement);
        var childHeadings = structuralChildren(heading);
        
        var content = function(displayed) {
          if (displayed)
            return getElementsToShow(elementsToHide, childHeadings);
          else
            return elementsToHide;
        };

        var folded;
        if (initiallyFolded === undefined)
          folded = false;
        else if (initiallyFolded === true || initiallyFolded === false)
          folded = initiallyFolded;
        else
          folded = initiallyFolded(heading, level);
        
        foldingUtil.connect(titleElement, content, folded, false);
        
        if ( ! levelByH1To6 && treeIndentInEm !== undefined)
          heading.style["padding-left"] = ""+((level - groundLevel) * treeIndentInEm)+"em";
      };


      return tocUtil;
    };
    
    
    return that;
    
  }();

On my homepage you can always see the current state of this project.

fritzthecat-blog

Blog-Archiv

Mittwoch, 27. Mai 2015

The Immortal AWK Language

The Shortest Useful is an AWK Program

AWK Program in a File

A Standard AWK Program

AWK Capabilities in an Example

Inputs to Process

Output to Achieve

The Program

Freitag, 15. Mai 2015

JS Document Treeification

Treeification Example

▷
One

▷
Two

▷
Three

▷
Four

Treeification Example

One

Two

Three

Four

Treeification Example

▽
One

▽
Two

▽
Three

▽
Four

▷
1 Use Old

▷
1.1 Rewrite

▷
1.2 Reuse

▷
1.3 Override

▷
2 To Create New

▷
2.1 Add

▷
2.1.1 Structural Children

▷
2.1.2 Elements to Toggle

▷
2.1.3 Putting It Together

▷
2.2 Use it

Über mich

fritzthecat-blog

Blog-Archiv

Mittwoch, 27. Mai 2015

The Immortal AWK Language

The Shortest Useful is an AWK Program

AWK Program in a File

A Standard AWK Program

AWK Capabilities in an Example

Inputs to Process

Output to Achieve

The Program

Freitag, 15. Mai 2015

JS Document Treeification

Treeification Example

▷One

▷Two

▷Three

▷Four

Treeification Example

One

Two

Three

Four

Treeification Example

▽One

▽Two

▽Three

▽Four

▷1 Use Old

▷1.1 Rewrite

▷1.2 Reuse

▷1.3 Override

▷2 To Create New

▷2.1 Add

▷2.1.1 Structural Children

▷2.1.2 Elements to Toggle

▷2.1.3 Putting It Together

▷2.2 Use it

Über mich

▷
One

▷
Two

▷
Three

▷
Four

▽
One

▽
Two

▽
Three

▽
Four

▷
1 Use Old

▷
1.1 Rewrite

▷
1.2 Reuse

▷
1.3 Override

▷
2 To Create New

▷
2.1 Add

▷
2.1.1 Structural Children

▷
2.1.2 Elements to Toggle

▷
2.1.3 Putting It Together

▷
2.2 Use it