Blog-Archiv

Donnerstag, 16. April 2015

JS Table of Contents

Since the times when we struggled with the table-of-contents of our first W*****d document we had a dynamic TOC in mind that appears automatically, doing chapter numbering and generating a hierarchical content representation on top, having links that directly scroll to the clicked chapter.

Here it is, for HTML documents, implemented in our favourite toy-language called JavaScript. The script is just about 230 lines (without comments), 3 KB when minified. No external library is used, so you need an up-to-date browser that supports querySelectorAll(). You find the the full and commented source code on bottom of this Blog. The text between explains how it works. The table-of-contents below is the proof of concept, because it was generated by the script itself :-)

Conventions

I advice functional inheritance for JavaScript. In the following code snippets public functions are written like tableOfContents ...

  var tocCreator = function()
  {
    var that = {};
    
    that.tableOfContents = function() {
      ....
    };

... while private functions are written like appendTocItem ...

    var appendTocItem = function() {
      ....
    };
    
    return that;
  };

Public functions can be overwritten, but not private functions. Both are bound to the that instance.

Preconditions

Now lets jump into the problems of chapter numbering and TOC generation.
Somehow the chapter structure must be represented by the document's HTML tags:

  1. either using headings <h1> to <h6>,
  2. or by nesting elements like <div> or <section> into each other

Here is an example for case 1, which I would like to call ...

Structured by Headings

    <h1>Heading Title</h1>
    
    <h2>One</h2>
    <p>This is chapter 1.</p>

    <h3>One - One</h3>
    <p>This is chapter 1.1.</p>

    <h4>One - One - One</h4>
    <p>This is chapter 1.1.1</p>

    <h4>One - One - Two</h4>
    <p>This is chapter 1.1.2</p>

    <h2>Two</h2>
    <p>This is chapter 2</p>

As we can see, the hierarchical structure of the chapters is represented by the type of h[1-6] heading tag. That means, we do not have chapters nested into each other like a <div> for chapter 1.1 that contains another <div> for chapter 1.1.1. Instead there is a continuous flow of chapters that gets their hierarchical order by the heading number.

And here is an example for case 2, which I would like to call ...

Structured by Nesting

   <h1>Nesting Title</h1>

    <div><h3>One</h3>
      <p>This is chapter 1.</p>

      <div>One - One
        <p>This is chapter 1.1.</p>

        <div>One - One - One
          <p>This is chapter 1.1.1</p>
        </div>

        <div>One - One - Two
          <p>This is chapter 1.1.2</p>
        </div>

      </div>
    </div>

    <div><h3>Two</h3>

This kind of structuring makes it possible to move or delete big parts of the document by touching just one element. On the other hand the hierarchical structure might get complex when the nesting level gets deep.

Other Structure

For any other kind of hierarchy you would have to implement a JS override. I will explain this in 4.1.3.

Searching Chapters

The first task in extracting a table-of-contents is finding the chapters. For Heading Structures this could be done by this:

var searchRoot = topElement || document.body;
var chapters = searchRoot.querySelectorAll("h2, h3, h4, h5, h6");

For a given topElement containing the chapters we can let default that topElement to document.body. Then we call the new JS function querySelectorAll(cssSelector) on that element to retrieve all chapter elements in document order. The result order of this fucntion is specified by the w3c.

For Nesting Structures the chapters could be extracted by this, in case a mix of <div> and <section> elements was used as nesting elements:

var chapters = searchRoot.querySelectorAll("div, section");

Now we have the chapters, maybe nested into each other, but in document order. When we can assign a hierarchy level to each of them, we would have all information needed to extract a table-of-contents.

Assigning Chapter Numbers

Assigning chapter numbers means finding out the hierarchy level for a chapter element, and then build a dotted number for it. For a tree level of 4 there would be four dotted numbers. Each of these numbers is the order number of the according element relative to its sibling elements on same level.

Hierarchy Level

The function to find out the hierarchy level of an element should be overridable. By doing that the script will be reusable for documents that model their structure e.g. by assigning classes. Here are two default implementations for the default cases "Heading" and "Nesting".

Headings

    var that = {};

    that.getHeadingLevel = function(heading) {
      return parseInt(heading.tagName.substring(1));
    };

This implementation assumes that headings are h1 - h6, and thus simply return the trailing number from the tag.

Nesting

    that.getNestingLevel = function(element, topElement, nestingTags) {
      var parent = element.parentNode;
      var i = 0;
      while (parent !== topElement) {
        if (parent.tagName && new RegExp(parent.tagName, "i").test(nestingTags))
          i++;
        parent = parent.parentNode;
      }
      return i;
    };

This implementation counts the number of parents conforming to the set of nesting tags given as parameter nestingTags. The case-insensitive RegExp is necessary to find "DIV" within the CSS selector "section, article, div".

Example Override: Hierarchy by CSS-class

Assuming you assigned CSS-classes to your chapter elements, you could override the getNestingLevel() function. Look at this HTML:

    <header class="h2">One</header>
    <p>This is chapter 1.</p>

    <header class="h3">One - One</header>
    <p>This is chapter 1.1.</p>

The script can be reused for this document structure by following JS override:

    var tocCreator = ....

    tocCreator.getNestingLevel = function(element, topElement, nestingTags) {
      return parseInt(element.className.substring(1)) - 1;
    };

    tocCreator.tableOfContents();

The JS module is loaded, wherever it comes from (AMD or name-spaces ...). Then the exposed function getNestingLevel() is overwritten by an implementation that accesses the element's class and reads the hierarchical level from it. Finally the module is called to generate a TOC for the document. That's all, try out the power of code reusage!

Building Dotted Numbers

When we can assign a tree level to each chapter element, and we have the elements in document order, we can build a stack engine that dispatches the chapter elements and gives us the opportunity to generate chapter numbers and a table-of-contents at the same time with just one pass.

The Stack Engine

We want to loop the chapter elements and push some state onto the stack when the tree level increases (can never be more than one), and pop from the stack when it decreases (can be any number).

The state we push must contain the current chapter list and its tree level. With the tree level we can build a dotted chapter number, iterating the stack from start to end and taking the current number of each level. The chapter lists represent the table-of-contents.

For every dispatched chapter element we will generate a list item for the TOC, and we generate a chapter number and prepend this before the chapter and the TOC item. At the same time we also can link the item to the chapter.

Utilities

Here are some functions we will need for the stack engine.

    that.findHeadingsInOrder = function(topElement, headingTags) {
      return topElement.querySelectorAll(headingTags);
    };

    that.createDefaultTocContainer = function(topElement) {
      var toc = document.createElement("div");
      var headings1 = topElement.getElementsByTagName("h1");
      var heading1 = (headings1 && headings1.length === 1) ? headings1[0] : undefined;
      if (heading1)
        heading1.parentElement.insertBefore(toc, heading1.nextSibling);
      else
        topElement.insertBefore(toc, topElement.children[0]);
      return toc;
    };

    var getLevel = function(levelByHeading, heading, topElement, headingTags) {
      return levelByHeading ?
          that.getHeadingLevel(heading) :
          that.getNestingLevel(heading, topElement, headingTags);
    };

    var peek = function(stack) {
      return stack[stack.length - 1];
    };

    that.createTocList = function() {
      var list = document.createElement("ul");
      list.style.cssText = "list-style-type: none;";
      return list;
    };
    
    var appendTocList = function(parent) {
      var list = that.createTocList();
      parent.appendChild(list);
      return list;
    };

    that.createHyperlink = function(headingHtml, thisHeadingId) {
      var hyperLink = document.createElement("a");
      hyperLink.innerHTML = headingHtml;
      hyperLink.setAttribute("href", "#"+thisHeadingId);
      return hyperLink;
    };

The findHeadingsInOrder() function finds all chapter elements for a given CSS selector.

The createDefaultTocContainer() function creates a container for the TOC when none was given by the caller. When a single <h1> is present, it seems that this is the document title, so it places the container below that. Else it places the container as first child of the top-element.

The getLevel() function will give us the tree level of each chapter element. It calls two delegates according to the used structuring, which is determined by levelByHeading = /h[1-6]/i.test(headingTags), which reads as "true when the given headingTags contain h1, h2, ... or h6, case-insensitive".

The peek(stack) function returns the last element from the given stack, without removing it.

The createTocList() function creates a new TOC chapter list to append chapter items to, and the appendTocList() function appends it for a certain tree level to the TOC.

The createHyperlink() function will create the hyperlink in TOC.

Chapter Loop

With these utilities (and another one that will follow) we can implement the following stack engine that loops the chapter elements and gives them the tree structure originating from getLevel().
Here we are at the main function of the TOC generator.

    that.tableOfContents = function(topElement, headingTags, tocContainer) {
      topElement = topElement || document.body;
      var headings = that.findHeadingsInOrder(topElement, headingTags);
      tocContainer = tocContainer || that.createDefaultTocContainer(topElement);
      var levelByHeading = /h[1-6]/i.test(headingTags);

      var stack = [];
      stack.push({
        headingTagLevel: getLevel(levelByHeading, headings[0], topElement, headingTags),
        chapterList: appendTocList(tocContainer)
      });
      
      for (var i = 0; i < headings.length; i++) {
        var heading = headings[i];
        var currentList = peek(stack);
        var currentLevel = currentList.headingTagLevel;
        var headingLevel = getLevel(levelByHeading, heading, topElement, headingTags);
        
        if (headingLevel === currentLevel + 1) {
          stack.push({
            headingTagLevel: headingLevel,
            chapterList: appendTocList(getLastListElement(currentList.chapterList))
          });
        }
        else if (headingLevel < currentLevel) {
          for (; headingLevel < currentLevel; currentLevel--)
            stack.pop();
        }
        else if (headingLevel > currentLevel + 1) {
          throw "Inconsistent document structure, incrementing level from "+currentLevel+" to "+headingLevel;
        }
        
        appendTocItem(levelByHeading, stack, heading);
      }
    };

The function parameters are (1) the element below which to search for chapter elements, (2) the chapter tags, like "h2, h3, h4, h5, h6" or "div, section", and (3) the element where to append the generated TOC.

The function then searches the chapters and pushes the initial state onto the stack engine, representing the first chapter element, called heading here. That state contains the TOC list of first level, and the first level number.

The loop then pushes or pops states whenever the tree level changes, and it appends items to the current TOC list by calling appendTocItem(). Here the chapter numbering and title extraction happens.

    var appendTocItem = function(levelByHeading, stack, element) {
      var nextChapterNumber = createNextChapterNumber(stack);
      
      var headingText = levelByHeading ? element.innerHTML : getTextContent(element);
      var headingElement = (levelByHeading || headingText) ? element : that.getHeadingFromNestedElement(element);
      if ( ! headingText )
        headingText = headingElement.innerHTML;
      
      var thisHeadingId = "h"+nextChapterNumber.replace(/\./g, '_');
      headingElement.setAttribute("id", thisHeadingId);
      if (levelByHeading)
        headingElement.innerHTML = nextChapterNumber+"&nbsp;"+headingText;
      else
        headingElement.insertBefore(document.createTextNode(nextChapterNumber+" "), headingElement.childNodes[0]);
      
      var tocItem = that.createTocItem();
      var chapterNumber = document.createElement("span");
      chapterNumber.innerHTML = nextChapterNumber+"&nbsp;";
      tocItem.appendChild(chapterNumber);
      
      var hyperLink = that.createHyperlink(headingText, thisHeadingId);
      tocItem.appendChild(hyperLink);
      
      var currentList = peek(stack);
      currentList.chapterList.appendChild(tocItem);
    };

The appendTocItem function does all the work necessary for one item in the table-of-contents. It builds a dotted chapter number, it retrieves the chapter title text, it attaches the number to the title text and its TOC duplicate, and it links them using an #id hash-tag. Finally it appends the ready-made item to the current chapter list.

Dotted Number Generator

When having a stack that holds a list for each tree level, it is really simple to build a dotted chapter number, because the length of that list is the current order number.

    var createNextChapterNumber = function(stack) {
      var number = "";
      for (var i = 0; i < stack.length; i++) {
        var currentList = stack[i];
        var currentNumber = currentList.chapterList.children.length;
        if (i === stack.length - 1)
          currentNumber++;
        currentNumber = that.convertChapterNumber(currentNumber);
        number = number+(number === "" ? "" : ".")+currentNumber;
      }
      return number;
    };

This loops the stack from level 1 to end level and appends the list length of each level to the dotted number. On the last level it adds one, because the next number is requested. It calls an overridable function convertChapterNumber() to enable conversion into other numbering systems.

Extracting Chapter Titles

Almost all problems have been solved. Just the extraction of the chapter title text remains. This could be difficult, so it is done in an overridable function (that.xxx = function() ...) to facilitate adapters for all kinds of documents.

First we try to get title text from the chapter element by finding leading text nodes.

    var getTextContent = function(nestingElement) {
      var title = (nestingElement.childNodes[0] && nestingElement.childNodes[0].nodeName === "#text") ?
          nestingElement.childNodes[0].textContent :
          undefined;
      return (title && (title = title.trim())) ? title : undefined;
    };

This returns an empty string for situations like this (there are no text nodes between the <div> chapter element and its <h3> follower element) ...

    <div><h3>One</h3>
      <p>This is chapter 1.</p>

... and would return some text in situations like this

    <div>One - One
      <p>This is chapter 1.1.</p>

In most documents the first case would be required. So there is an overridable function that extracts the title text from the chapter element.

    that.getHeadingFromNestedElement = function(nestedElement) {
      while (nestedElement.children.length === 1)
        nestedElement = nestedElement.children[0];
      return nestedElement.children[0] ? nestedElement.children[0] : nestedElement;
    };

This removes single-child containers and then returns the first child of the "peeled" chapter element. Override it if another element contains the desired chapter title text.

Full Source Code

Here comes the full source code, including comments. A lot of details not mentioned above are implemented here, look at the parameters of the tableOfContents() function. Simply put that script into a <script> tag at end of your HTML document body, and then see whether everything was done automatically.

When not, it is time to read this manual and do some override :-)

  Click to see source code

You can also go to my homepage to see the current development state of this utility.





Keine Kommentare: