Since the times when we struggled with the table-of-contents of our first W*****d document we
had a dynamic TOC in mind that appears automatically,
doing chapter numbering and generating a hierarchical content representation on top,
having links that directly scroll to the clicked chapter.
Here it is, for HTML documents, implemented in our favourite toy-language called JavaScript.
The script is just about 230 lines (without comments), 3 KB when minified.
No external library is used, so you need an up-to-date browser that supports querySelectorAll()
.
You find the the full and commented source code on bottom of this Blog.
The text between explains how it works.
The table-of-contents below is the proof of concept, because it was generated by the script itself :-)
Conventions
I advice functional inheritance for JavaScript.
In the following code snippets public functions are written like tableOfContents
...
var tocCreator = function()
{
var that = {};
that.tableOfContents = function() {
....
};
... while private functions are written like appendTocItem
...
var appendTocItem = function() {
....
};
return that;
};
Public functions can be overwritten, but not private functions. Both are bound to the that
instance.
Preconditions
Now lets jump into the problems of chapter numbering and TOC generation.
Somehow the chapter structure must be represented by the document's HTML tags:
- either using headings <h1> to <h6>,
- or by nesting elements like <div> or <section> into each other
Here is an example for case 1, which I would like to call ...
Structured by Headings
<h1>Heading Title</h1>
<h2>One</h2>
<p>This is chapter 1.</p>
<h3>One - One</h3>
<p>This is chapter 1.1.</p>
<h4>One - One - One</h4>
<p>This is chapter 1.1.1</p>
<h4>One - One - Two</h4>
<p>This is chapter 1.1.2</p>
<h2>Two</h2>
<p>This is chapter 2</p>
As we can see, the hierarchical structure of the chapters is represented by the type of h[1-6] heading tag.
That means, we do not have chapters nested into each other like a <div> for chapter 1.1
that contains another <div> for chapter 1.1.1. Instead there is a continuous flow of chapters
that gets their hierarchical order by the heading number.
And here is an example for case 2, which I would like to call ...
Structured by Nesting
<h1>Nesting Title</h1>
<div><h3>One</h3>
<p>This is chapter 1.</p>
<div>One - One
<p>This is chapter 1.1.</p>
<div>One - One - One
<p>This is chapter 1.1.1</p>
</div>
<div>One - One - Two
<p>This is chapter 1.1.2</p>
</div>
</div>
</div>
<div><h3>Two</h3>
This kind of structuring makes it possible to move or delete big parts of the document
by touching just one element.
On the other hand the hierarchical structure might get complex when the nesting level gets deep.
Other Structure
For any other kind of hierarchy you would have to implement a JS override.
I will explain this in 4.1.3.
Searching Chapters
The first task in extracting a table-of-contents is finding the chapters.
For Heading Structures this could be done by this:
var searchRoot = topElement || document.body;
var chapters = searchRoot.querySelectorAll("h2, h3, h4, h5, h6");
For a given topElement
containing the chapters we can let default that topElement
to document.body
. Then we call the new JS function querySelectorAll(cssSelector)
on that element to retrieve all chapter elements in document order.
The result order of this fucntion is specified by the w3c.
For Nesting Structures the chapters could be extracted by this, in case a mix of
<div> and <section> elements was used as nesting elements:
var chapters = searchRoot.querySelectorAll("div, section");
Now we have the chapters, maybe nested into each other, but in document order.
When we can assign a hierarchy level to each of them, we would have all information needed to extract a table-of-contents.
Assigning Chapter Numbers
Assigning chapter numbers means finding out the hierarchy level for a chapter element, and then build a dotted number for it.
For a tree level of 4 there would be four dotted numbers.
Each of these numbers is the order number of the according element relative to its sibling elements on same level.
Hierarchy Level
The function to find out the hierarchy level of an element should be overridable.
By doing that the script will be reusable for documents that model their structure e.g. by assigning classes.
Here are two default implementations for the default cases "Heading" and "Nesting".
Headings
var that = {};
that.getHeadingLevel = function(heading) {
return parseInt(heading.tagName.substring(1));
};
This implementation assumes that headings are h1 - h6, and thus simply return the trailing number from the tag.
Nesting
that.getNestingLevel = function(element, topElement, nestingTags) {
var parent = element.parentNode;
var i = 0;
while (parent !== topElement) {
if (parent.tagName && new RegExp(parent.tagName, "i").test(nestingTags))
i++;
parent = parent.parentNode;
}
return i;
};
This implementation counts the number of parents conforming to the set of nesting tags given as parameter nestingTags
.
The case-insensitive RegExp
is necessary to find "DIV" within the CSS selector "section, article, div".
Example Override: Hierarchy by CSS-class
Assuming you assigned CSS-classes to your chapter elements, you could override the
getNestingLevel()
function.
Look at this HTML:
<header class="h2">One</header>
<p>This is chapter 1.</p>
<header class="h3">One - One</header>
<p>This is chapter 1.1.</p>
The script can be reused for this document structure by following JS override:
var tocCreator = ....
tocCreator.getNestingLevel = function(element, topElement, nestingTags) {
return parseInt(element.className.substring(1)) - 1;
};
tocCreator.tableOfContents();
The JS module is loaded, wherever it comes from (AMD or name-spaces ...).
Then the exposed function getNestingLevel()
is overwritten by an implementation
that accesses the element's class and reads the hierarchical level from it.
Finally the module is called to generate a TOC for the document.
That's all, try out the power of code reusage!
Building Dotted Numbers
When we can assign a tree level to each chapter element, and we have the elements in document order,
we can build a stack engine that dispatches the chapter elements and gives us the opportunity
to generate chapter numbers and a table-of-contents at the same time with just one pass.
The Stack Engine
We want to loop the chapter elements and push some state onto the stack when the tree level
increases (can never be more than one), and pop from the stack when it decreases (can be any number).
The state we push must contain the current chapter list and its tree level.
With the tree level we can build a dotted chapter number, iterating the stack from start to end
and taking the current number of each level. The chapter lists represent the table-of-contents.
For every dispatched chapter element we will generate a list item for the TOC, and we generate
a chapter number and prepend this before the chapter and the TOC item. At the same time we also
can link the item to the chapter.
Utilities
Here are some functions we will need for the stack engine.
that.findHeadingsInOrder = function(topElement, headingTags) {
return topElement.querySelectorAll(headingTags);
};
that.createDefaultTocContainer = function(topElement) {
var toc = document.createElement("div");
var headings1 = topElement.getElementsByTagName("h1");
var heading1 = (headings1 && headings1.length === 1) ? headings1[0] : undefined;
if (heading1)
heading1.parentElement.insertBefore(toc, heading1.nextSibling);
else
topElement.insertBefore(toc, topElement.children[0]);
return toc;
};
var getLevel = function(levelByHeading, heading, topElement, headingTags) {
return levelByHeading ?
that.getHeadingLevel(heading) :
that.getNestingLevel(heading, topElement, headingTags);
};
var peek = function(stack) {
return stack[stack.length - 1];
};
that.createTocList = function() {
var list = document.createElement("ul");
list.style.cssText = "list-style-type: none;";
return list;
};
var appendTocList = function(parent) {
var list = that.createTocList();
parent.appendChild(list);
return list;
};
that.createHyperlink = function(headingHtml, thisHeadingId) {
var hyperLink = document.createElement("a");
hyperLink.innerHTML = headingHtml;
hyperLink.setAttribute("href", "#"+thisHeadingId);
return hyperLink;
};
The findHeadingsInOrder()
function finds all chapter elements
for a given CSS selector.
The createDefaultTocContainer()
function creates a container for the TOC when none was given by the caller.
When a single <h1> is present, it seems that this is the document title, so it places the container below that.
Else it places the container as first child of the top-element.
The getLevel()
function will give us the tree level of each chapter element.
It calls two delegates according to the used structuring, which is determined by
levelByHeading = /h[1-6]/i.test(headingTags)
, which reads as
"true when the given headingTags
contain h1, h2, ... or h6, case-insensitive".
The peek(stack)
function returns the last element from the given stack, without removing it.
The createTocList()
function creates a new TOC chapter list to append chapter items to, and
the appendTocList()
function appends it for a certain tree level to the TOC.
The createHyperlink()
function will create the hyperlink in TOC.
Chapter Loop
With these utilities (and another one that will follow) we can implement the following stack engine
that loops the chapter elements and gives them the tree structure originating from getLevel()
.
Here we are at the main function of the TOC generator.
that.tableOfContents = function(topElement, headingTags, tocContainer) {
topElement = topElement || document.body;
var headings = that.findHeadingsInOrder(topElement, headingTags);
tocContainer = tocContainer || that.createDefaultTocContainer(topElement);
var levelByHeading = /h[1-6]/i.test(headingTags);
var stack = [];
stack.push({
headingTagLevel: getLevel(levelByHeading, headings[0], topElement, headingTags),
chapterList: appendTocList(tocContainer)
});
for (var i = 0; i < headings.length; i++) {
var heading = headings[i];
var currentList = peek(stack);
var currentLevel = currentList.headingTagLevel;
var headingLevel = getLevel(levelByHeading, heading, topElement, headingTags);
if (headingLevel === currentLevel + 1) {
stack.push({
headingTagLevel: headingLevel,
chapterList: appendTocList(getLastListElement(currentList.chapterList))
});
}
else if (headingLevel < currentLevel) {
for (; headingLevel < currentLevel; currentLevel--)
stack.pop();
}
else if (headingLevel > currentLevel + 1) {
throw "Inconsistent document structure, incrementing level from "+currentLevel+" to "+headingLevel;
}
appendTocItem(levelByHeading, stack, heading);
}
};
The function parameters are
(1) the element below which to search for chapter elements,
(2) the chapter tags, like "h2, h3, h4, h5, h6" or "div, section",
and (3) the element where to append the generated TOC.
The function then searches the chapters and pushes the initial state onto the stack engine,
representing the first chapter element, called heading
here. That state
contains the TOC list of first level, and the first level number.
The loop then pushes or pops states whenever the tree level changes, and it appends
items to the current TOC list by calling appendTocItem()
. Here the chapter numbering
and title extraction happens.
var appendTocItem = function(levelByHeading, stack, element) {
var nextChapterNumber = createNextChapterNumber(stack);
var headingText = levelByHeading ? element.innerHTML : getTextContent(element);
var headingElement = (levelByHeading || headingText) ? element : that.getHeadingFromNestedElement(element);
if ( ! headingText )
headingText = headingElement.innerHTML;
var thisHeadingId = "h"+nextChapterNumber.replace(/\./g, '_');
headingElement.setAttribute("id", thisHeadingId);
if (levelByHeading)
headingElement.innerHTML = nextChapterNumber+" "+headingText;
else
headingElement.insertBefore(document.createTextNode(nextChapterNumber+" "), headingElement.childNodes[0]);
var tocItem = that.createTocItem();
var chapterNumber = document.createElement("span");
chapterNumber.innerHTML = nextChapterNumber+" ";
tocItem.appendChild(chapterNumber);
var hyperLink = that.createHyperlink(headingText, thisHeadingId);
tocItem.appendChild(hyperLink);
var currentList = peek(stack);
currentList.chapterList.appendChild(tocItem);
};
The appendTocItem
function does all the work necessary for one item in the table-of-contents.
It builds a dotted chapter number, it retrieves the chapter title text, it attaches the number to the title text and
its TOC duplicate, and it links them using an #id
hash-tag.
Finally it appends the ready-made item to the current chapter list.
Dotted Number Generator
When having a stack that holds a list for each tree level, it is really simple to build a dotted chapter number,
because the length of that list is the current order number.
var createNextChapterNumber = function(stack) {
var number = "";
for (var i = 0; i < stack.length; i++) {
var currentList = stack[i];
var currentNumber = currentList.chapterList.children.length;
if (i === stack.length - 1)
currentNumber++;
currentNumber = that.convertChapterNumber(currentNumber);
number = number+(number === "" ? "" : ".")+currentNumber;
}
return number;
};
This loops the stack from level 1 to end level and appends the list length of each level to the dotted number.
On the last level it adds one, because the next number is requested.
It calls an overridable function convertChapterNumber()
to enable conversion into other numbering systems.
Extracting Chapter Titles
Almost all problems have been solved. Just the extraction of the chapter title text remains.
This could be difficult, so it is done in an overridable function (that.xxx = function() ...
)
to facilitate adapters for all kinds of documents.
First we try to get title text from the chapter element by finding leading text nodes.
var getTextContent = function(nestingElement) {
var title = (nestingElement.childNodes[0] && nestingElement.childNodes[0].nodeName === "#text") ?
nestingElement.childNodes[0].textContent :
undefined;
return (title && (title = title.trim())) ? title : undefined;
};
This returns an empty string for situations like this
(there are no text nodes between the <div> chapter element and its <h3> follower element) ...
<div><h3>One</h3>
<p>This is chapter 1.</p>
... and would return some text in situations like this
<div>One - One
<p>This is chapter 1.1.</p>
In most documents the first case would be required.
So there is an overridable function that extracts the title text from the chapter element.
that.getHeadingFromNestedElement = function(nestedElement) {
while (nestedElement.children.length === 1)
nestedElement = nestedElement.children[0];
return nestedElement.children[0] ? nestedElement.children[0] : nestedElement;
};
This removes single-child containers and then returns the first child of the "peeled" chapter element.
Override it if another element contains the desired chapter title text.
Full Source Code
Here comes the full source code, including comments.
A lot of details not mentioned above are implemented here, look at the parameters of the tableOfContents()
function.
Simply put that script into a <script> tag at end of your HTML document body,
and then see whether everything was done automatically.
When not, it is time to read this manual and do some override :-)
▶ Click to see source code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292 | var tocCreator = function()
{
var that = {};
/**
* Reads headings, numbers them, extracts them to a linked table-of-contents
* and inserts that TOC below a single h1 heading, or as first child of given top-element.
* @param topElement optional, the HTML element to search for headings,
* defaults to document body.
* @param headingTags optional, the CSS selector to use for search of structuring elements,
* defaults to "h2, h3, h4, h5, h6".
* @param tocContainer optional, the container where to insert the created TOC,
* defaults to topElement below a single H1, or on top.
* @param doTocNumbering optional, chapter numbers are prepended to TOC when true,
* defaults to true.
* @param doHeadingNumbering optional, chapter numbers are prepended to headings when true,
* defaults to true.
* @param chapterNumberSeparator optional, separator for chapter numbers,
* defaults to ".".
* @param doTrailingSeparator optional, when true chapter numbers will be
* "1.2." instead of "1.2", defaults to false.
* @return the given or created container of the table-of-contents.
*/
that.tableOfContents = function(
topElement,
headingTags,
tocContainer,
doTocNumbering,
doHeadingNumbering,
chapterNumberSeparator,
doTrailingSeparator)
{
topElement = topElement || document.body;
headingTags = headingTags || guessHeadingTags(topElement);
doTocNumbering = (doTocNumbering === undefined) ? true : doTocNumbering;
doHeadingNumbering = (doHeadingNumbering === undefined) ? true : doHeadingNumbering;
chapterNumberSeparator = chapterNumberSeparator || ".";
doTrailingSeparator = (doTrailingSeparator === undefined) ? false : doTrailingSeparator;
var headings = that.findHeadingsInOrder(topElement, headingTags);
if ( ! headings || ! headings.length )
return;
tocContainer = tocContainer || that.createDefaultTocContainer(topElement);
var levelByHeading = /h[1-6]/i.test(headingTags);
var stack = [];
stack.push({
headingTagLevel: getLevel(levelByHeading, headings[0], topElement, headingTags),
chapterList: appendTocList(tocContainer)
});
for (var i = 0; i < headings.length; i++) {
var heading = headings[i];
var currentList = peek(stack);
var currentLevel = currentList.headingTagLevel;
var headingLevel = getLevel(levelByHeading, heading, topElement, headingTags);
if (headingLevel === currentLevel + 1) {
stack.push({
headingTagLevel: headingLevel,
chapterList: appendTocList(getLastListElement(currentList.chapterList))
});
}
else if (headingLevel < currentLevel) {
for (; headingLevel < currentLevel; currentLevel--)
stack.pop();
}
else if (headingLevel > currentLevel + 1) {
throw "Inconsistent document structure, incrementing level from "+currentLevel+" to "+headingLevel;
}
appendTocItem(
levelByHeading,
stack,
heading,
doTocNumbering,
doHeadingNumbering,
chapterNumberSeparator,
doTrailingSeparator);
}
return tocContainer;
};
var guessHeadingTags = function(topElement) {
var defaultHeadingTags = "h2, h3, h4, h5, h6";
var defaultHeadings = that.findHeadingsInOrder(topElement, defaultHeadingTags);
var firstTagName;
for (var i = 0; i < defaultHeadings.length; i++) {
if ( ! firstTagName )
firstTagName = defaultHeadings[i].tagName;
else
if (firstTagName !== defaultHeadings[i].tagName)
return defaultHeadingTags; // found at least two different heading tags
}
if (hasMoreThanOne(topElement, "div"))
return "div";
if (hasMoreThanOne(topElement, "section"))
return "section";
if (hasMoreThanOne(topElement, "article"))
return "article";
return defaultHeadingTags;
};
var hasMoreThanOne = function(topElement, cssSelector) {
var headings = that.findHeadingsInOrder(topElement, cssSelector);
return headings && headings.length > 1;
};
/**
* Reads all headings to extract and number, in document order:
* querySelectorAll() does it, see
* http://www.w3.org/TR/selectors-api/#queryselectorall.
* @param topElement the HTML element to search for headings.
* @param headingTags the CSS selector to use for search.
* @return something that has a length and can be iterated with a for loop.
*/
that.findHeadingsInOrder = function(topElement, headingTags) {
return topElement.querySelectorAll(headingTags);
};
/**
* Searches for a h1, places TOC as div below it when found, else
* places TOC as first child of given HTML element or body.
* @return what is to be used as container for the table of contents.
*/
that.createDefaultTocContainer = function(topElement) {
var toc = document.createElement("div");
var headings1 = topElement.getElementsByTagName("h1");
var heading1 = (headings1 && headings1.length === 1) ? headings1[0] : undefined;
if (heading1)
heading1.parentElement.insertBefore(toc, heading1.nextSibling);
else
topElement.insertBefore(toc, topElement.children[0]);
return toc;
};
var getLevel = function(levelByHeading, heading, topElement, headingTags) {
return levelByHeading ?
that.getHeadingLevel(heading) :
that.getNestingLevel(heading, topElement, headingTags);
};
/** @return the given heading's number, e.g. "3" from "h3". */
that.getHeadingLevel = function(heading) {
return parseInt(heading.tagName.substring(1));
};
/** @return the 0-n level of given element, relative to given topElement. */
that.getNestingLevel = function(element, topElement, nestingTags) {
var parent = element.parentNode;
var i = 0;
while (parent !== topElement) {
if (parent.tagName && new RegExp(parent.tagName, "i").test(nestingTags))
i++; /* when tag is in nesting-tags, count this as level */
parent = parent.parentNode;
}
return i;
};
var peek = function(stack) {
return stack[stack.length - 1];
};
var getLastListElement = function(list) {
return list.children[list.children.length - 1];
};
/**
* This implementation returns the input number unchanged.
* @return a chapter number converted to arbitrary numbering system (Roman, letter, ...).
*/
that.convertChapterNumber = function(number) {
return number;
};
var createNextChapterNumber = function(stack, chapterNumberSeparator) {
var number = "";
for (var i = 0; i < stack.length; i++) {
var currentList = stack[i];
var currentNumber = currentList.chapterList.children.length;
if (i === stack.length - 1)
currentNumber++;
currentNumber = that.convertChapterNumber(currentNumber);
number = number+(number === "" ? "" : chapterNumberSeparator)+currentNumber;
}
return number;
};
/**
* This implementation creates an UL element without list-style-type (no bullets).
* @return a list that will hold a chapter headings.
*/
that.createTocList = function() {
var list = document.createElement("ul");
list.style.cssText = "list-style-type: none;";
return list;
};
var appendTocList = function(parent) {
var list = that.createTocList();
parent.appendChild(list);
return list;
};
/**
* This implementation creates a LI element.
* @return an TOC item element that will receive a chapter heading link.
*/
that.createTocItem = function() {
return document.createElement("li");
};
/**
* This implementation creates and links an A element.
* @return a TOC link element that will receive a chapter heading text.
*/
that.createHyperlink = function(headingHtml, thisHeadingId) {
var hyperLink = document.createElement("a");
hyperLink.innerHTML = headingHtml;
hyperLink.setAttribute("href", "#"+thisHeadingId);
return hyperLink;
};
var getTextContent = function(nestingElement) {
var title = (nestingElement.childNodes[0] && nestingElement.childNodes[0].nodeName === "#text") ?
nestingElement.childNodes[0].textContent :
undefined;
return (title && (title = title.trim())) ? title : undefined;
};
/**
* @return this is called when the document is structured by
* nested elements, and it is expected to return the heading-element
* for given nested container, e.g. the heading in a section.
* This implementation returns the first child element that
* is not a single child.
*/
that.getHeadingFromNestedElement = function(nestedElement) {
while (nestedElement.children.length === 1)
nestedElement = nestedElement.children[0]; /* in case there are wrappers */
return nestedElement.children[0] ? nestedElement.children[0] : nestedElement;
};
var appendTocItem = function(
levelByHeading,
stack,
element,
doTocNumbering,
doHeadingNumbering,
chapterNumberSeparator,
doTrailingSeparator)
{
var nextChapterNumber = createNextChapterNumber(stack, chapterNumberSeparator);
if (doTrailingSeparator)
nextChapterNumber = nextChapterNumber + chapterNumberSeparator;
var headingText = levelByHeading ? element.innerHTML : getTextContent(element);
var headingElement = (levelByHeading || headingText) ? element : that.getHeadingFromNestedElement(element);
if ( ! headingText )
headingText = headingElement.innerHTML;
var thisHeadingId = "h"+nextChapterNumber.replace(/\./g, '_');
headingElement.setAttribute("id", thisHeadingId);
if (doHeadingNumbering)
if (levelByHeading)
headingElement.innerHTML = nextChapterNumber+" "+headingText;
else
headingElement.insertBefore(document.createTextNode(nextChapterNumber+" "), headingElement.childNodes[0]);
var tocItem = that.createTocItem();
if (doTocNumbering) {
var chapterNumber = document.createElement("span");
chapterNumber.innerHTML = nextChapterNumber+" ";
tocItem.appendChild(chapterNumber);
}
var hyperLink = that.createHyperlink(headingText, thisHeadingId);
tocItem.appendChild(hyperLink);
var currentList = peek(stack);
currentList.chapterList.appendChild(tocItem);
};
return that;
};
tocCreator().tableOfContents();
|
You can also go to my
homepage
to see the current development state of this utility.