Blog-Archiv

Mittwoch, 29. April 2015

vi Manual

You surely have already tried a lot of text editors in your life. But do you know the oldest of all? If you are using UNIX or LINUX, or WINDOWS with CYGWIN, you already got it on your machine. When not, here is its download page:

Vim

This is the latest materialization of vi. There is even a Java fork of it: jvi.

But why should we care about such exotic old and complicated editors?

  • Because with vi you could edit a MySQL dump file that has 200 MB or more (your favorite editor reported memory problems after a minute when trying to load it).
  • Because you want to use several text buffers for cut / copy & paste, not always the one-and-only system clipboard.
  • Because you need to configure some things on a LINUX server that has no graphical environment installed.

Might be useful in some situations ...

vi is an UNIX editor that runs in terminal mode, meaning you do not need a graphical environment to use it, just the curses library. Mind that you won't ever see a scrollbar with vi, and it does not have a menu, and it (most likely) won't respond to mouse-clicks!

vi has two modes:

  1. command mode
  2. edit mode

In command mode you can not enter or change text, you just can launch commands to start doing such.
In edit mode you can write new text.
You get into edit mode by pressing certain keys like 'i' or 'a' (insert, append, see reference below).
From there you can return to command mode by pressing ESCAPE.

So when you use vi, and you want to think about what you just have written, type ESCAPE, else you will forget that you are in edit mode and use the cursor keys, and this leads to control-characters in text!

To save data and finish the editor, you need ex-commands. These start with a colon ':' from command mode, and are committed by typing ENTER. ex was the predecessor of vi.
For example, you normally quit vi by typing : x ENTER. But you can also launch any shell command, like e.g. a look at the current directory, by : ! ls -l ENTER.

Starting and Terminating

Start to edit one or more files by

vi filepath [filepath2 filepath3 ...]

Initially you will be in command mode. Then use following ex-commands.

Input Semantic
:q! Dismiss data and quit editor
(quit without ! would be denied when data have been modified)
:x Save and quit
:w Save text, but do not quit
:w filepath Write text to given filepath
:f Display name of currently visible file
:n Go to next loaded file
:rew Return to first loaded file (rewind)
:e filepath Load another file
:set all Display all current settings
:set encoding=Cp1252
:set number
:set nonumber
:set autoindent
:set noautoindent
Set the encoding to Cp1252,
set line numbers,
unset line numbers
set automatic indentation,
unset automatic indentation

Moving the Cursor

Normally the Cursor Keys and Page-Up / Page-Down should work.
When not, use the following keys from command mode (press ESCAPE before).

Input Semantic
h Left one character
j Down one line
k Up one line
l Right one character

Here are further commands to move around in file (press ESCAPE before).

Input Semantic
w Move one word forward
b Move one word backward
^ Move to start of line
$ Move to end of line
1G Move to first line (2G = move to 2nd line, ...)
$G Move to last line
Ctrl-f Page down (forward)
Ctrl-b Page up (backward)

Editing

Inserting

All the following keys are typed in command mode, and they lead to edit mode.
When you type ENTER in edit mode, you will continue inserting text in a new line.
Other than ESCAPE, ENTER does not terminate edit mode.

Input Semantic
i Insert at current cursor position
I Insert at start of current line
a Append after current cursor position
A Append at end of current line
o Create empty line above current line
O Create empty line below current line
ESCAPE Finish edit mode, return to command mode

Modifying

Like insert and append, following keys are typed in command mode and lead to edit mode.

Input Semantic
cw Change word at current cursor position
c$ Change text from current cursor position until end of line
cc Change the whole line

Useful modification commands that do not lead to edit mode:

r Replace character at cursor position with next input character
J Join current line with line below.
To split a line, go to split-position and type i ENTER.
>> Indent one tab
<< Un-indent one tab

Deleting

Following commands delete text in command mode, and they stay in command mode.

Input Semantic
x Delete character at cursor position (2x = delete 2 chars, ...)
dw Delete word at cursor position
d$ Delete text from cursor position until end of line
dd Delete current line (2dd = delete two lines, ....)
.,$d
1,.d
Delete all lines from current to last line
Delete all lines from first to current line

Undo and Redo

Typing u is Undo, and typing u again is Redo :-)

Input Semantic
u Undo the recent change
:undo Undo the recent change
:redo Redo the recent change
. Repeat the previous command at current cursor position, whatever it was

Cut, Copy & Paste

Using the Default Buffer

When you delete text, the deleted text is automatically copied into default buffer.
You can insert the content of that buffer anywhere by typing p (print).
To copy text into the default buffer, use y (yank).

Copy & Paste of a word:

 (move cursor to start of word to copy)
 yw
 (move to insert position)
 p
 

Cut & Paste of a word:

 (move cursor to start of word to cut)
 dw
 (move to insert position)
 p
 

Copy & Paste of a text line:

 (move to line to copy)
 yy
 (move above insert line)
 p
 

Cut & Paste of a text line:

 (move to line to copy)
 dd
 (move above insert line)
 p
 
Input Semantic
yw Copy word at cursor position into default buffer (yank word)
yy Copy current line into default buffer (2yy = copy two lines, ....)
.,$y
1,.y
Copy (yank) all lines from current to last line
Copy all lines from first to current line
p Insert default buffer contents at current cursor position or line (print)

Copy & Paste of a text block:

 (move cursor to start of block to copy)
 ma
 (move to end of block to copy)
 y'a
 (move to line above target position)
 p
 

Cut & Paste of a text block:

 (move cursor to start of block to copy)
 ma
 (move to end of block to copy)
 d'a
 (move to line above target position)
 p
 
Input Semantic
ma Set the mark with name 'a' onto current line, which can be accessed then by 'a
y'a Copy lines between current line and mark 'a' into default buffer
d'a Cut lines between current line and mark 'a' into default buffer

Using Named Buffers

Unlike most modern editors, vi provides more than one text buffer for Copy & Paste.
You can give them names, fill them with text, and then use them by prepending "b to commands (assuming the buffer was named 'b').

Copy & Paste of a text block using a named buffer:

 (move cursor to start of block to copy)
 ma
 (move to end of block to copy)
 "by'a
 (move to line above target position)
 "bp
 

Cut & Paste of a text block using a named buffer:

 (move cursor to start of block to copy)
 ma
 (move to end of block to copy)
 "bd'a
 (move to line above target position)
 "bp
 
Input Semantic
"byw Copy current word into buffer with name 'b'
"bdw Delete current word into buffer with name 'b'
"byy Copy current line into buffer with name 'b'
"bdd Delete current line into buffer with name 'b'
"by'a Copy lines between current line and mark 'a' into buffer with name 'b'
"bd'a Delete lines between current line and mark 'a' into buffer with name 'b'
"bp Insert contents of buffer 'b' at current cursor position or line

Search and Replace

Searching

Forward search is done by typing '/' in command mode, backward search by '?'.
You can use regular expressions in the search pattern.

Input Semantic
/name Downward-search for 'name' in current file, starting from current line
?name Upward-search for 'name'
n Goes to next occurrence of recent search pattern
:set ignorecase Makes the search case-insensitive
:set noignorecase Makes the search case-sensitive

Replacing

Search and replace is done with an ex-command.
You must declare a line range for the replacement. 1,$ declares the whole file.
You can use regular expressions in the search pattern (text after first slash).

Input Semantic
:1,$s/word/WORLD/g From line 1 to last line, search 'word' and replace it by 'WORLD' (g = globally, all occurrences in a line)

Epilogue

vi is an editor for the skilled. Once you used it for a while you will appreciate it for its quick facilities. Unfortunately good documentation about this great editor is not easy to find.




Donnerstag, 16. April 2015

JS Table of Contents

Since the times when we struggled with the table-of-contents of our first W*****d document we had a dynamic TOC in mind that appears automatically, doing chapter numbering and generating a hierarchical content representation on top, having links that directly scroll to the clicked chapter.

Here it is, for HTML documents, implemented in our favourite toy-language called JavaScript. The script is just about 230 lines (without comments), 3 KB when minified. No external library is used, so you need an up-to-date browser that supports querySelectorAll(). You find the the full and commented source code on bottom of this Blog. The text between explains how it works. The table-of-contents below is the proof of concept, because it was generated by the script itself :-)

Conventions

I advice functional inheritance for JavaScript. In the following code snippets public functions are written like tableOfContents ...

  var tocCreator = function()
  {
    var that = {};
    
    that.tableOfContents = function() {
      ....
    };

... while private functions are written like appendTocItem ...

    var appendTocItem = function() {
      ....
    };
    
    return that;
  };

Public functions can be overwritten, but not private functions. Both are bound to the that instance.

Preconditions

Now lets jump into the problems of chapter numbering and TOC generation.
Somehow the chapter structure must be represented by the document's HTML tags:

  1. either using headings <h1> to <h6>,
  2. or by nesting elements like <div> or <section> into each other

Here is an example for case 1, which I would like to call ...

Structured by Headings

    <h1>Heading Title</h1>
    
    <h2>One</h2>
    <p>This is chapter 1.</p>

    <h3>One - One</h3>
    <p>This is chapter 1.1.</p>

    <h4>One - One - One</h4>
    <p>This is chapter 1.1.1</p>

    <h4>One - One - Two</h4>
    <p>This is chapter 1.1.2</p>

    <h2>Two</h2>
    <p>This is chapter 2</p>

As we can see, the hierarchical structure of the chapters is represented by the type of h[1-6] heading tag. That means, we do not have chapters nested into each other like a <div> for chapter 1.1 that contains another <div> for chapter 1.1.1. Instead there is a continuous flow of chapters that gets their hierarchical order by the heading number.

And here is an example for case 2, which I would like to call ...

Structured by Nesting

   <h1>Nesting Title</h1>

    <div><h3>One</h3>
      <p>This is chapter 1.</p>

      <div>One - One
        <p>This is chapter 1.1.</p>

        <div>One - One - One
          <p>This is chapter 1.1.1</p>
        </div>

        <div>One - One - Two
          <p>This is chapter 1.1.2</p>
        </div>

      </div>
    </div>

    <div><h3>Two</h3>

This kind of structuring makes it possible to move or delete big parts of the document by touching just one element. On the other hand the hierarchical structure might get complex when the nesting level gets deep.

Other Structure

For any other kind of hierarchy you would have to implement a JS override. I will explain this in 4.1.3.

Searching Chapters

The first task in extracting a table-of-contents is finding the chapters. For Heading Structures this could be done by this:

var searchRoot = topElement || document.body;
var chapters = searchRoot.querySelectorAll("h2, h3, h4, h5, h6");

For a given topElement containing the chapters we can let default that topElement to document.body. Then we call the new JS function querySelectorAll(cssSelector) on that element to retrieve all chapter elements in document order. The result order of this fucntion is specified by the w3c.

For Nesting Structures the chapters could be extracted by this, in case a mix of <div> and <section> elements was used as nesting elements:

var chapters = searchRoot.querySelectorAll("div, section");

Now we have the chapters, maybe nested into each other, but in document order. When we can assign a hierarchy level to each of them, we would have all information needed to extract a table-of-contents.

Assigning Chapter Numbers

Assigning chapter numbers means finding out the hierarchy level for a chapter element, and then build a dotted number for it. For a tree level of 4 there would be four dotted numbers. Each of these numbers is the order number of the according element relative to its sibling elements on same level.

Hierarchy Level

The function to find out the hierarchy level of an element should be overridable. By doing that the script will be reusable for documents that model their structure e.g. by assigning classes. Here are two default implementations for the default cases "Heading" and "Nesting".

Headings

    var that = {};

    that.getHeadingLevel = function(heading) {
      return parseInt(heading.tagName.substring(1));
    };

This implementation assumes that headings are h1 - h6, and thus simply return the trailing number from the tag.

Nesting

    that.getNestingLevel = function(element, topElement, nestingTags) {
      var parent = element.parentNode;
      var i = 0;
      while (parent !== topElement) {
        if (parent.tagName && new RegExp(parent.tagName, "i").test(nestingTags))
          i++;
        parent = parent.parentNode;
      }
      return i;
    };

This implementation counts the number of parents conforming to the set of nesting tags given as parameter nestingTags. The case-insensitive RegExp is necessary to find "DIV" within the CSS selector "section, article, div".

Example Override: Hierarchy by CSS-class

Assuming you assigned CSS-classes to your chapter elements, you could override the getNestingLevel() function. Look at this HTML:

    <header class="h2">One</header>
    <p>This is chapter 1.</p>

    <header class="h3">One - One</header>
    <p>This is chapter 1.1.</p>

The script can be reused for this document structure by following JS override:

    var tocCreator = ....

    tocCreator.getNestingLevel = function(element, topElement, nestingTags) {
      return parseInt(element.className.substring(1)) - 1;
    };

    tocCreator.tableOfContents();

The JS module is loaded, wherever it comes from (AMD or name-spaces ...). Then the exposed function getNestingLevel() is overwritten by an implementation that accesses the element's class and reads the hierarchical level from it. Finally the module is called to generate a TOC for the document. That's all, try out the power of code reusage!

Building Dotted Numbers

When we can assign a tree level to each chapter element, and we have the elements in document order, we can build a stack engine that dispatches the chapter elements and gives us the opportunity to generate chapter numbers and a table-of-contents at the same time with just one pass.

The Stack Engine

We want to loop the chapter elements and push some state onto the stack when the tree level increases (can never be more than one), and pop from the stack when it decreases (can be any number).

The state we push must contain the current chapter list and its tree level. With the tree level we can build a dotted chapter number, iterating the stack from start to end and taking the current number of each level. The chapter lists represent the table-of-contents.

For every dispatched chapter element we will generate a list item for the TOC, and we generate a chapter number and prepend this before the chapter and the TOC item. At the same time we also can link the item to the chapter.

Utilities

Here are some functions we will need for the stack engine.

    that.findHeadingsInOrder = function(topElement, headingTags) {
      return topElement.querySelectorAll(headingTags);
    };

    that.createDefaultTocContainer = function(topElement) {
      var toc = document.createElement("div");
      var headings1 = topElement.getElementsByTagName("h1");
      var heading1 = (headings1 && headings1.length === 1) ? headings1[0] : undefined;
      if (heading1)
        heading1.parentElement.insertBefore(toc, heading1.nextSibling);
      else
        topElement.insertBefore(toc, topElement.children[0]);
      return toc;
    };

    var getLevel = function(levelByHeading, heading, topElement, headingTags) {
      return levelByHeading ?
          that.getHeadingLevel(heading) :
          that.getNestingLevel(heading, topElement, headingTags);
    };

    var peek = function(stack) {
      return stack[stack.length - 1];
    };

    that.createTocList = function() {
      var list = document.createElement("ul");
      list.style.cssText = "list-style-type: none;";
      return list;
    };
    
    var appendTocList = function(parent) {
      var list = that.createTocList();
      parent.appendChild(list);
      return list;
    };

    that.createHyperlink = function(headingHtml, thisHeadingId) {
      var hyperLink = document.createElement("a");
      hyperLink.innerHTML = headingHtml;
      hyperLink.setAttribute("href", "#"+thisHeadingId);
      return hyperLink;
    };

The findHeadingsInOrder() function finds all chapter elements for a given CSS selector.

The createDefaultTocContainer() function creates a container for the TOC when none was given by the caller. When a single <h1> is present, it seems that this is the document title, so it places the container below that. Else it places the container as first child of the top-element.

The getLevel() function will give us the tree level of each chapter element. It calls two delegates according to the used structuring, which is determined by levelByHeading = /h[1-6]/i.test(headingTags), which reads as "true when the given headingTags contain h1, h2, ... or h6, case-insensitive".

The peek(stack) function returns the last element from the given stack, without removing it.

The createTocList() function creates a new TOC chapter list to append chapter items to, and the appendTocList() function appends it for a certain tree level to the TOC.

The createHyperlink() function will create the hyperlink in TOC.

Chapter Loop

With these utilities (and another one that will follow) we can implement the following stack engine that loops the chapter elements and gives them the tree structure originating from getLevel().
Here we are at the main function of the TOC generator.

    that.tableOfContents = function(topElement, headingTags, tocContainer) {
      topElement = topElement || document.body;
      var headings = that.findHeadingsInOrder(topElement, headingTags);
      tocContainer = tocContainer || that.createDefaultTocContainer(topElement);
      var levelByHeading = /h[1-6]/i.test(headingTags);

      var stack = [];
      stack.push({
        headingTagLevel: getLevel(levelByHeading, headings[0], topElement, headingTags),
        chapterList: appendTocList(tocContainer)
      });
      
      for (var i = 0; i < headings.length; i++) {
        var heading = headings[i];
        var currentList = peek(stack);
        var currentLevel = currentList.headingTagLevel;
        var headingLevel = getLevel(levelByHeading, heading, topElement, headingTags);
        
        if (headingLevel === currentLevel + 1) {
          stack.push({
            headingTagLevel: headingLevel,
            chapterList: appendTocList(getLastListElement(currentList.chapterList))
          });
        }
        else if (headingLevel < currentLevel) {
          for (; headingLevel < currentLevel; currentLevel--)
            stack.pop();
        }
        else if (headingLevel > currentLevel + 1) {
          throw "Inconsistent document structure, incrementing level from "+currentLevel+" to "+headingLevel;
        }
        
        appendTocItem(levelByHeading, stack, heading);
      }
    };

The function parameters are (1) the element below which to search for chapter elements, (2) the chapter tags, like "h2, h3, h4, h5, h6" or "div, section", and (3) the element where to append the generated TOC.

The function then searches the chapters and pushes the initial state onto the stack engine, representing the first chapter element, called heading here. That state contains the TOC list of first level, and the first level number.

The loop then pushes or pops states whenever the tree level changes, and it appends items to the current TOC list by calling appendTocItem(). Here the chapter numbering and title extraction happens.

    var appendTocItem = function(levelByHeading, stack, element) {
      var nextChapterNumber = createNextChapterNumber(stack);
      
      var headingText = levelByHeading ? element.innerHTML : getTextContent(element);
      var headingElement = (levelByHeading || headingText) ? element : that.getHeadingFromNestedElement(element);
      if ( ! headingText )
        headingText = headingElement.innerHTML;
      
      var thisHeadingId = "h"+nextChapterNumber.replace(/\./g, '_');
      headingElement.setAttribute("id", thisHeadingId);
      if (levelByHeading)
        headingElement.innerHTML = nextChapterNumber+"&nbsp;"+headingText;
      else
        headingElement.insertBefore(document.createTextNode(nextChapterNumber+" "), headingElement.childNodes[0]);
      
      var tocItem = that.createTocItem();
      var chapterNumber = document.createElement("span");
      chapterNumber.innerHTML = nextChapterNumber+"&nbsp;";
      tocItem.appendChild(chapterNumber);
      
      var hyperLink = that.createHyperlink(headingText, thisHeadingId);
      tocItem.appendChild(hyperLink);
      
      var currentList = peek(stack);
      currentList.chapterList.appendChild(tocItem);
    };

The appendTocItem function does all the work necessary for one item in the table-of-contents. It builds a dotted chapter number, it retrieves the chapter title text, it attaches the number to the title text and its TOC duplicate, and it links them using an #id hash-tag. Finally it appends the ready-made item to the current chapter list.

Dotted Number Generator

When having a stack that holds a list for each tree level, it is really simple to build a dotted chapter number, because the length of that list is the current order number.

    var createNextChapterNumber = function(stack) {
      var number = "";
      for (var i = 0; i < stack.length; i++) {
        var currentList = stack[i];
        var currentNumber = currentList.chapterList.children.length;
        if (i === stack.length - 1)
          currentNumber++;
        currentNumber = that.convertChapterNumber(currentNumber);
        number = number+(number === "" ? "" : ".")+currentNumber;
      }
      return number;
    };

This loops the stack from level 1 to end level and appends the list length of each level to the dotted number. On the last level it adds one, because the next number is requested. It calls an overridable function convertChapterNumber() to enable conversion into other numbering systems.

Extracting Chapter Titles

Almost all problems have been solved. Just the extraction of the chapter title text remains. This could be difficult, so it is done in an overridable function (that.xxx = function() ...) to facilitate adapters for all kinds of documents.

First we try to get title text from the chapter element by finding leading text nodes.

    var getTextContent = function(nestingElement) {
      var title = (nestingElement.childNodes[0] && nestingElement.childNodes[0].nodeName === "#text") ?
          nestingElement.childNodes[0].textContent :
          undefined;
      return (title && (title = title.trim())) ? title : undefined;
    };

This returns an empty string for situations like this (there are no text nodes between the <div> chapter element and its <h3> follower element) ...

    <div><h3>One</h3>
      <p>This is chapter 1.</p>

... and would return some text in situations like this

    <div>One - One
      <p>This is chapter 1.1.</p>

In most documents the first case would be required. So there is an overridable function that extracts the title text from the chapter element.

    that.getHeadingFromNestedElement = function(nestedElement) {
      while (nestedElement.children.length === 1)
        nestedElement = nestedElement.children[0];
      return nestedElement.children[0] ? nestedElement.children[0] : nestedElement;
    };

This removes single-child containers and then returns the first child of the "peeled" chapter element. Override it if another element contains the desired chapter title text.

Full Source Code

Here comes the full source code, including comments. A lot of details not mentioned above are implemented here, look at the parameters of the tableOfContents() function. Simply put that script into a <script> tag at end of your HTML document body, and then see whether everything was done automatically.

When not, it is time to read this manual and do some override :-)

  Click to see source code

You can also go to my homepage to see the current development state of this utility.