Blog-Archiv

Sonntag, 17. Juli 2016

Refactoring JS List Filters, Part Two

Refactoring my JS list filters has led to more than just encapsulating functions into modules. I found that it makes sense to integrate some general search aspects to have really reusable source code.

What are these aspects? A filter input field ...

  • ... could refer to a certain column of the table to filter, e.g. the "Title" column, or the "Creation Date" column
  • ... could denote an alternative or a restriction (resulting in logical AND / OR conditions)
  • ... must be combinable with others arbitrarily (resulting in parentheses around AND / OR conditions)

A fully fledged implementation of search-filters would end in a "logical expressions" library, in other words, you could write field-conditions using comparators like ==, <, <=, >, >=, contains, ..., attribute them by logical operator NOT, and then concatenate them by using logical operators AND and OR, whereby you would express precedences by enclosing certain parts into parentheses (see examples below).

But I don't want to go that far with my Blog article filters. I just want to be able to perform following ...

Use Cases

I want to ...

... find articles about at least one of several predefined semantics (checkboxes)
... revert that, i.e. display articles that are NOT about any of several predefined semantics
... restrict to articles contained in a "Best Of" list
... restrict to titles containing a search-term, entered at runtime in a text-field
... restrict to a creation date-range (not referring title but creation-date)

Examples would be:

  1. "Find articles about 'HTML' or 'Java', restricted to "Best Of" articles created in year 2015."
    ( article.searchWords.contains("HTML") OR article.searchWords.contains("Java") ) AND bestOfList.contains(article.identity) AND article.creationDate.contains("2015")

  2. "Find articles about none of the predefined search-terms ('HTML', 'CSS', 'JavaScript', 'Java', 'LINUX'), combined with those containing 'LINUX', restricted to articles with titles containing 'script'
    ( NOT ( article.searchWords.contains("HTML") OR article.searchWords.contains("CSS") OR .... ) OR article.searchWords.contains("LINUX") ) AND article.title.contains("script")

Design Decisions

In particular it's even more subtle, because when I turn on the "JavaScript" checkbox, I also must search "JS" and "jQuery" in titles. That means, each such checkbox carries several search terms, not just one.

But anyway, I will not provide parentheses but hardcode them, because in the examples above you always see the same pattern:

  1. collect articles using the alternative filters (OR), in case none were activated, provide all articles
  2. shorten that collection by applying restricting filters (AND)

For every article title, I will check it against all restrictions (if any), and then, if it passes, check against all alternatives (if any).

User Interface Prototype

Title Search

In first layout-row there are alternatives, in second restrictions.
Mind that not all checkboxes are OR-filters, the "Best Of" checkbox is an AND-filter.

You can try out these filters, and see all source code, on my Blog Archive.

JS Implementations

There are two abstract modules, a number of concrete extensions for filter-fields, and a string utility (on bottom of this article).

Filter Input Field Wrappers

Abstract Wrapper

This module encapsulates exactly one

  • INPUT field,
  • its context (table column),
  • its working mode (case-sensitivity),
  • its logical operator (restricting = AND, not restricting = OR),
  • the fact whether it should use the word-array (labelWords) for matching titles
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
    /**
     * Wraps a filter input field and its current state.
     */
    var abstractFilterField = function(inputField, column, caseSensitive, restricting, usesWordArray)
    {
      var self = {};
      
      self.addChangeListener = function(callbackFunction) {
        inputField.addEventListener('change', callbackFunction);
      };
      
      self.matchesOne = function(labelText, labelWords) {
        var searchWords = self.getSearchWords(column);
        
        for (var i = 0; i < searchWords.length; i++) {
          var searchWord = searchWords[i];
          if (usesWordArray && searchWord.indexOf(' ') < 0) {
            if (labelWords.indexOf(searchWord) >= 0)
              return true;
          }
          else if (labelText.indexOf(searchWord) >= 0) {
            return true;
          }
        }
        return false;
      };
      
      self.getSearchWords = function() {
        throw 'Extension must implement getSearchWords() !';
      };
      
      self.isValid = function() {
        throw 'Extension must implement isValid() !';
      };
      
      self.isRestricting = function() {
        return (restricting === undefined) ? false : restricting;
      };
      
      self.isCaseSensitive = function() {
        return (caseSensitive === undefined) ? false : caseSensitive;
      };
      
      self.getColumn = function() {
        return column; 
      };
      
      return self;
    };

Remember that a JS module is a function. We can call that function, and then enrich the returned object with concrete function implementations.

For all INPUT fields, adding a change-listener and matching some article-title is solved here. Mind that matchesOne() implementation does not check isValid(), so the caller must take care of this.

Concrete Wrappers

We have two types of HTML elements: checkboxes and text fields. For checkboxes, we have the alternatives and the "Others" checkbox (that inverts their meaning).

The text-filter can be easily implemented by deriving the abstractFilterField module. I set it to be restricting = true by default, when no such parameter was given on construction.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    var textFilter = function(inputField, column, caseSensitive, restricting, usesWordArray)
    {
      restricting = (restricting === undefined) ? true : restricting;
      
      var self = abstractFilterField(inputField, column, caseSensitive, restricting, usesWordArray);
      
      self.getSearchWords = function() {
        return inputField.value ?
            stringUtil.toArrayWithoutEmpty(inputField.value, ',', ! caseSensitive) :
            [];
      };
      
      self.isValid = function() {
        return inputField.value && inputField.value.trim().length > 0;
      };
      
      return self;
    };

This module overwrites the exception-throwing functions of the abstraction. Mind that you can also concatenate search words by ',' here, which would make such fields also using OR conditions.

You will see how to use this module soon, first lets implement the checkboxes. There is also an abstraction for the two types of them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    var abstractCheckboxFilter = function(inputField, column, caseSensitive, restricting)
    {
      var self = abstractFilterField(inputField, column, caseSensitive, restricting, true);
      
      self.isValid = function() {
        return inputField.checked;
      };
      
      return self;
    };

This encapsulates the fact that a checkbox provides its state in the checked attribute, and hardcodes the usesWordArray parameter to true. Here come concrete derivations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    var checkboxFilter = function(checkbox, column, caseSensitive, restricting, SEARCH_WORDS_ATTRIBUTE_NAME)
    {
      SEARCH_WORDS_ATTRIBUTE_NAME = SEARCH_WORDS_ATTRIBUTE_NAME || 'search-words';
      
      var cachedSearchWords = stringUtil.toArrayWithoutEmpty(
          checkbox.getAttribute(SEARCH_WORDS_ATTRIBUTE_NAME),
          ',',
          ! caseSensitive);
      
      var self = abstractCheckboxFilter(checkbox, column, caseSensitive, restricting);
      
      self.getSearchWords = function() {
        return cachedSearchWords;
      };
      
      return self;
    };

This module reads its search words on construction from given attribute of the INPUT element. The getSearchWords() implementation returns them.

Next is the "Others" checkbox, which needs to know all other alternative-fields to invert their result, thus it receives them on construction.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    var othersCheckboxFilter = function(checkbox, column, otherCheckboxFilters)
    {
      var self = abstractCheckboxFilter(checkbox, column);
      
      var otherSearchWords = [];
      
      for (var c = 0; c < otherCheckboxFilters.length; c++) {
        var searchWords = otherCheckboxFilters[c].getSearchWords();
        for (var w = 0; w < searchWords.length; w++)
            otherSearchWords.push(searchWords[w]);
      }
      
      var superMatchesOne = self.matchesOne;
      
      self.matchesOne = function(labelText, labelWords) {
        return ! superMatchesOne(labelText, labelWords);
      };
      
      self.getSearchWords = function() {
        return otherSearchWords;
      };
      
      return self;
    };

This implementation extracts the search-terms from all given otherCheckboxFilters and stores them to a private variable. These are returned by getSearchWords(), which is called by matchesOne(). The overwrite of matchesOne() calls superMatchesOne, and then simply reverts the result. That's the power of overriding, it makes this complicated looking functionality very simple!

Mind that all these wrappers will be separate object instances, with private field instances, because abstractFilterField always returns a new object when called.

You might wonder how the "Best Of" field can be implemented by these wrappers, because obviously it got no specific module. We can wrap this field by constructing it with restricting = true, and pointing its column to the first table-column, which carries the order number of the article. The search-words of this field then must be a comma-separated list of numbers of "Best Of" articles. That means, the page author must write these numbers.

Remember that this still is a client-side search, we do not access the linked document in any way by AJAX.

List Filter

This is the heart piece. We are going to process conditions from filter fields now. There will be one list filter instance per table to filter, holding all its filter fields. There is only an abstract module, each page must provide adapter code to manage HTML elements of the table or list to filter. For clearness I will show that adapter code now first.

      <div >Title Search</div>
      
      <div>
        <label><input type='checkbox' class="title-alternative" search-words='JS, JavaScript, jQuery'/>JavaScript </label>
        <label><input type='checkbox' class="title-alternative" search-words='CSS, Cascading Style Sheets'/>CSS </label>
        <label><input type='checkbox' class="title-alternative" search-words='HTML, Page'/>HTML </label>
        <label><input type='checkbox' class="title-alternative" search-words='Java, Constructors'/>Java </label>
        <label><input type='checkbox' class="title-alternative" search-words='LINUX, UNIX, vi, AWK, GIMP'/>LINUX </label>
        <label><input type='checkbox' id='others'/>Others </label>
        <br>
        <label><input type='checkbox' id='best-of' search-words='101,94,92,90,79,78,77,75,70,61,58,52,47,45,43,40,37,36,29,23,20,12,6'/>Best Of </label>
        <label style="padding-left: 2em;"> Title <input type='text' id="title-column-filter" placeholder="JS, CSS, HTML" style="width: 8em;"/></label>
        <label> Created <input type='text' id="date-column-filter" placeholder="2015, 2016" style="width: 6em;"/></label>
      </div>
      
      <table>
        <tr><td>106</td><td><a href='Refactoring_JS_List_Filters__Part_One.html'>Refactoring JS List Filters, Part One</a></td><td>2016-06-21</td></tr>
        ....
        <tr><td>1</td><td><a href='Things_Are_Changing.html'>Things Are Changing</a></td><td>2008-02-26</td></tr>
      </table>

This is a fragment of the HTML page. You see how synonyms for search-terms are defined, and how the "Best Of" filter is written. The table below is the filter target, exposing the column structure.

Following JS code, wrapped in a <script type="text/javascript"> tag, makes use of the ids and classes defined in that HTML.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
    (function() {
      /** Concrete filter-implementation for this page */

      /* retrieving filter fields */
      var checkboxes = document.body.querySelectorAll('input.title-alternative');
      var others = document.getElementById('others');
      var bestOf = document.getElementById('best-of');
      var titleTextField = document.getElementById('title-column-filter');
      var dateTextField = document.getElementById('date-column-filter');
      
      /* map fields to list-columns */
      var columns = [ "number", "title", "creation-date" ];
      var filters = [];
      
      for (var i = 0; i < checkboxes.length; i++)
        filters.push(checkboxFilter(checkboxes[i], columns[1]));
      
      filters.push(othersCheckboxFilter(others, columns[1], filters));
      
      filters.push(checkboxFilter(bestOf, columns[0], false, true));
      filters.push(textFilter(titleTextField, columns[1], false));
      filters.push(textFilter(dateTextField, columns[2], false));
      
      /* build a list-filter */
      var listFilter = abstractListFilter(filters);

      /* tell list-filter where the rows to filter are */
      listFilter.getRows = function() {
        var table = document.getElementsByTagName('table')[0];
        var tbody = table.children[0];
        return tbody.children;
      };

      /* tell list-filter how to get text from a row and column */
      listFilter.getColumns = function() {
        return columns;
      };
      
      /* tell list-filter how to get text from a row and column */
      listFilter.getText = function(row, column) {
        if (column === columns[0])
          return row.children[0].childNodes[0].textContent;
        
        if (column === columns[1])
          return row.children[1].children[0].childNodes[0].textContent;
        
        if (column === columns[2])
          return row.children[2].childNodes[0].textContent;
        
        throw "Not implemented column: "+column;
      };
      
    }());

Starting at line 5, the filter fields are retrieved. This implementation anticipates that there is just one table to filter on this page.

Then, on line 12, columns of the table are named. These names are arbitrary, but they must be used at construction of filter fields (lines 16-22) and the getColumns() and getText() overrides (line 35 and 40) consistently.

From line 13-22, fields are wrapped by constructing adequate objects for them. They receive their input-field, column, and restricting flag.

A list of them is then passed to the construction of an abstract list-filter on line 25. That filter is concretized by overriding three of its functions (that most likely would throw an exception). The getColumns() function returns the list of column names, the getRows() function knows how to retrieve table rows from the HTML document, and finally the getText function can return a text for a specific table cell, addressed by row and column-name.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
    /**
     * Abstract implementation of a list filter.
     * @param filters list of fields representing abstractFilterField instances.
     */
    var abstractListFilter = function(filters)
    {
      /* private */
      
      var matchFilters = function(restricting, filters, labelText, labelTextUpperCase, labelWords, labelWordsUpperCase) {
        for (var f = 0; f < filters.length; f++) {
          var filter = filters[f];
          var labelTextParam = filter.isCaseSensitive() ? labelText : labelTextUpperCase;
          var labelWordsParam = filter.isCaseSensitive() ? labelWords : labelWordsUpperCase;

          if (filter.matchesOne(labelTextParam, labelWordsParam)) {
            if ( ! restricting )
              return true; /* short OR-cycle */
          }
          else if (restricting) {
            return false; /* short AND-cycle */
          }
        }
        return restricting;
      };
      
      var matchRowColumn = function(restricting, row, column, validColumnFilters) {
        var labelText = self.getText(row, column).trim();
        var labelTextUpperCase = labelText.toUpperCase();
        var labelWords = stringUtil.toArrayWithoutEmpty(labelText, ' ');
        var labelWordsUpperCase = stringUtil.toArrayWithoutEmpty(labelTextUpperCase, ' ');
        
        return matchFilters(restricting, validColumnFilters, labelText, labelTextUpperCase, labelWords, labelWordsUpperCase);
      };
      
      var isValidFilter = function(filter, restricting, column) {
        return filter.isValid() &&
                filter.isRestricting() === restricting &&
                filter.getColumn() === column;
      };
      
      var getValidFilters = function(restricting, column) {
        var validColumnFilters = [];
        for (var f = 0; f < filters.length; f++)
          if (isValidFilter(filters[f], restricting, column))
            validColumnFilters.push(filters[f]);
        return validColumnFilters;
      };
      
      var matchRowColumns = function(restricting, row, columns) {
        var filtersExist = false;
        
        for (var c = 0; c < columns.length; c++) {
          var validColumnFilters = getValidFilters(restricting, columns[c]);
          
          if (validColumnFilters.length > 0) {
            filtersExist = true;
            
            if (matchRowColumn(restricting, row, columns[c], validColumnFilters)) {
              if ( ! restricting )
                return true; /* short OR-cycle */
            }
            else if (restricting) {
              return false; /* short AND-cycle */
            }
          }
        }
        return filtersExist ? restricting : true; /* no filters means: accept */
      };
      
      var filterRow = function(row, columns) {
        var displayed = true;
        
        if ( ! matchRowColumns(true, row, columns) ||
             ! matchRowColumns(false, row, columns) )
          /* one denied, or none agreed */
          displayed = false;
        
        self.setDisplayed(row, displayed);
      };
      
      var self = {};

      /* public */
      
      /**
       * The filter-listener callback, carried out when the user
       * clicks a checkbox or leaves a textfield, to be installed
       * by concrete list-filter implementations.
       */
      self.filter = function() {
        var rows = self.getRows();
        var columns = self.getColumns();
        
        for (var r = 0; r < rows.length; r++)
          filterRow(rows[r], columns);
      };
      
      /**
       * Concrete sub-classes CAN override this to support several table columns.
       * @return array of table column identifiers. This implementation
       *    returns an array containing abstractListFilter.DUMMY_COLUMN.
       */
      self.getColumns = function() {
        return [ abstractListFilter.DUMMY_COLUMN ];
      };
      
      /**
       * Concrete sub-classes MUST implement this.
       * @returns array of rows to filter.
       */
      self.getRows = function() {
        throw 'Extension must implement getRows() !';
      };

      /**
       * Concrete sub-classes MUST implement this.
       * @param row required, a row as returned from getRows().
       * @param column optional, a column of the row to get text from,
       *    as returned from getColumns().
       * @returns the text from given row in given column,
       *    used to compare with search-words.
       */
      self.getText = function(row, column) {
        throw 'Extension must implement getText() !';
      };

      /**
       * Concrete sub-classes CAN override this.
       * This implementation sets the *display* style-property of the row.
       * @param row required, the TABLE row to set filtered-out or filtered-in.
       * @param displayed required, when true, the row matches the filters, false when not.
       */
      self.setDisplayed = function(row, displayed) {
        row.style.display = displayed ? '' : 'none';
      };
      
      
      for (var i = 0; i < filters.length; i++)
        filters[i].addChangeListener(self.filter);
      
      
      return self;
    };

    /** The single-column-identifier when getColumns() was not overridden. */
    abstractListFilter.DUMMY_COLUMN = "dummy-column";

Read this from bottom to top. On bottom you find the public part, while private implementations are on top.

On the very bottom (line 138) you see that an event listener function is installed into all filter fields. The filter() function will be called by them all.

The public part contains three functions that are meant to be overwritten: getColumns(), getRows(), getText(). The getColumns() provides a default implementation in case there is just one column, but the other two MUST Be overwritten, else exceptions will be thrown.

The setDisplayed() is public for convenience-overrides that want e.g. to gray-out non-matching rows instead of making them invisible.

The private part contains the real complexity. The public filter() function, called on each change-event, retrieves all table rows and loops over them, calling filterRow(). The responsibility of that private function is to set the row either displayed or invisible by calling setDisplayed().

To find out whether a row is displayed or not, the filterRow() function calls matchRowColumns() twice, once asking for all restricting filters, once for all non-restricting. Should none of them match, the row is regarded to be not displayed.

The matchRowColumns() function lopps over column-names. It retrieves all valid filters assigned to one column and conforming to the restricting flag, and delegates to matchRowColumn() to find out whether the filters match.
Should one column match, and restricting is false, it performs a "short cycle" and immediatley returns true.
Should one column not match, and restricting is true, it also immediatley returns false.
Further this function finds out whether filters existed for any column, and returns true when not. Thus all rows would be displayed if no filter is present or valid.

The getValidFilters() and isValidFilter() functions are responsible for retrieving filter fields connected to a column-name and according to a restricting-flag.

The matchRowColumn() column now decides for exactly one table cell whether its filters match it. It retrieves the text from the cell, provides different representations of it, and then passes that to matchFilters().

There, finally, the filter fields are looped to match the cell text, in one of its representations. Here the matchesOne() function of abstractFilterField is called. Again a short cycle is performed when one of the match-results allows an immediate return.

String Util

Here are the string functions used in above implementations, wrapped into a singleton module to be easily reachable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
    /**
     * String utility singleton.
     */
    var stringUtil = (function()
    {
      var PUNCTUATION_CHARACTERS = ',:?!.;()';

      /** @returns true when given character is a punctuation character. */
      var isPunctuation = function(character) {
        for (var i = 0; i < PUNCTUATION_CHARACTERS.length; i++)
          if (PUNCTUATION_CHARACTERS.charAt(i) === character)
            return true;
        return false;
      };

      /** 
       * @param the string from which a character should be read.
       * @param first true when to read first character, false for last.
       * @returns the first or last character from given string,
       *    undefined when string was empty.
       */
      var getTrimChar = function(string, first) {
        if ( ! string )
          return undefined;
        return string.charAt(first ? 0 : string.length - 1);
      };

      /** 
       * @param the string from which a character should be removed.
       * @param first true when to remove first character, false for last.
       * @returns the trimmed string, undefined when string was empty.
       */
      var removeTrimChar = function(string, first) {
        if ( ! string )
          return undefined;
        return first ? string.substring(1) : string.substring(0, string.length - 1);
      };
      
      var self = {};
      
      /**
       * @param toSplit the string containing words to scan.
       * @param splitChar the character that separates the words in toSplit.
       * @param toUpper optional, when true, all strings are converted to upper-case.
       * @returns an array of words scanned from given toSplit,
       *    trimmed, punctuations removed, upper-case, no empty strings being in array.
       */
      self.toArrayWithoutEmpty = function(toSplit, splitChar, toUpper) {
        var rawWords = toSplit.split(splitChar);
        var words = [];

        for (var i = 0; i < rawWords.length; i++) {
          var rawWord = rawWords[i].trim();

          for (var l = 0, locations = [ true, false ]; l < locations.length; l++) {
            var location = locations[l];
            while (rawWord.length > 0 && isPunctuation(getTrimChar(rawWord, location)))
              rawWord = removeTrimChar(rawWord, location);
          }

          rawWord = rawWord.trim();
          if (rawWord.length > 0)
            words.push(toUpper ? rawWord.toUpperCase() : rawWord);
        }
        return words;
      };
     
      return self;
      
    }());

Summary

You can watch this working in my Blog Archive on my homepage, expand the title, then expand "Semantic Search". If you use your browser's "View Page Source" context menu item, or press Ctrl-U, you will see all source code that was described here.




Keine Kommentare: