Blog-Archiv

Dienstag, 3. Mai 2022

Formatting Text to Newspaper Column in Java

This article is about an experimental formatter that lets you shrink some unformatted text into a "newspaper" column. It provides word-wrap, that means on the right side of the column there may occur empty spaces.

The problem with word-wrap is the width calculation for a word. Every letter may have a different width, and every font may can have different values for the proportions of its letters. For instance, Monospaced is a special font where all letters have the same width, so 'i' and 'm' will take the same horizontal space, but this is the exception and not the rule, most fonts will have different widths for characters. Thus you can format a column only through knowledge of the used font.

Here is the formatter. Enter some text in the left (or top) textarea, then click the "Format" button, the resulting newspaper column should be rendered in the right (or bottom) textarea. Use the number field to change the column width.

Input Text: average characters
Output Text:

I created this page from Java source code, using https://www.jsweet.org/jsweet-live-sandbox for translation to JavaScript.

Algorithm in Java

This works by passing a text and a font to a class named TextColumn and getting back the formatted text column. The font must implement the inner interface TextColumn.Font.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
import java.util.ArrayList;
import java.util.List;

/**
 * Formats a text to fit into a given column width.
 * Puts line-breaks also into long words (like URLs)
 * that would break the column, but generally tries
 * to not break words over lines (searches spaces).
 * Preserves newlines when they are more than one.
 */
public class TextColumn
{
    /**
     * Users of this class must implement this interface
     * to calculate letter- and word-widths.
     */
    public interface Font
    {
        float stringWidth(String text);
        float widthCorrectionForLongWords();
    }
    
    
    private final Font font;
    private final float width;
    
    public TextColumn(Font font, int characterCount) {
        this(font, ((font.stringWidth("i") + font.stringWidth("m")) / 2) * characterCount);
    }
    
    public TextColumn(Font font, float deviceWidth) {
        this.font = font;
        this.width = deviceWidth;
    }
    
    public List<String> splitToLines(String text) {
        text = text.replaceAll("\t", " ");
        text = text.replaceAll("\r\n", "\n"); // replace WINDOWS newlines by simple ones
        text = replaceSingleNewlinesBySpace(text);
        text = text.replaceAll("[ ]{2,}", " "); // normalize spaces
        
        final List<String> lines = new ArrayList<String>();
        for (String paragraph : text.split("\n")) // a paragraph is a chunk of text that has no newlines.
            for (String line : splitParagraphToFitWidth(paragraph))
                lines.add(line);
        
        return lines;
    }
    
    private String replaceSingleNewlinesBySpace(String text) {
        final StringBuilder result = new StringBuilder();
        for (int i = 0; i < text.length(); i++) {
            final char c = text.charAt(i);
            if (c == '\n' &&
                    i > 0 && text.charAt(i - 1) != '\n' &&
                    i < text.length() - 1 && text.charAt(i + 1) != '\n')
                result.append(' ');
            else
                result.append(c);
        }
        return result.toString();
    }

    private List<String> splitParagraphToFitWidth(String text) {
        final List<String> lines = new ArrayList<String>();
        if (text.length() <= 0) {   // preserve newlines
            lines.add(text);  // will serve as empty line
            return lines;
        }
        
        int previousSpaceIndex = -1;
        
        while (text.length() > 0) {
            final int nextSpaceIndex = nextSpaceIndex(text, previousSpaceIndex + 1);  // search first space
            final String line = text.substring(0, nextSpaceIndex).trim();
            final float lineWidth = font.stringWidth(line);
            
            if (lineWidth > width) {   // must split at preceding space if any
                if (previousSpaceIndex < 0) // no preceding space, split by width
                    previousSpaceIndex = nextSplitIndex(text);
                
                lines.add(text.substring(0, previousSpaceIndex));
                text = text.substring(previousSpaceIndex).trim();
                previousSpaceIndex = -1;
            }
            else if (nextSpaceIndex == text.length()) { // reached end of text
                lines.add(line);
                text = "";
            }
            else    {    // search further until width is trespassed
                previousSpaceIndex = nextSpaceIndex;    // remember latest space for split
            }
        }
        
        return lines;
    }

    private int nextSpaceIndex(String text, int searchStartIndex) {
        final int length = text.length();
        for (int i = searchStartIndex; i < length; i++)
            if (Character.isWhitespace(text.charAt(i)))
                return i;
        return length;
    }

    private int nextSplitIndex(String text) {
        final float width = this.width - font.widthCorrectionForLongWords();
        for (int i = 1; i < text.length(); i++) {
            final String line = text.substring(0, i).trim();
            final float lineWidth = font.stringWidth(line);
            if (lineWidth >= width)
                return i;
        }
        return text.length();
    }

}

Single newlines would be replaced by a space by this implementation, only multiple newlines are preserved.

A platform-independent "PortableFont" class implements TextColumn.Font. It carries hardcoded character widths, taken from a Java AWT "Dialog" font of 12 points.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import java.util.Hashtable;
import java.util.Map;

public class PortableFont implements TextColumn.Font
{
    @Override
    public float stringWidth(String text) {
        float width = 0f;
        for (char c : text.toCharArray())    {
            final Float mappedWidth = characterWidthMap.get(c);
            width += (mappedWidth != null) ? mappedWidth : characterWidthMap.get('n');
        }
        return width;
    }

    @Override
    public float widthCorrectionForLongWords() {
        return 0;
    }

    private static final Map<Character,Float> characterWidthMap = new Hashtable<>(99);
    static {
        characterWidthMap.put('a', 7.3535156f);
        characterWidthMap.put('b', 7.6171875f);
        characterWidthMap.put('c', 6.5976562f);
        characterWidthMap.put('d', 7.6171875f);
        characterWidthMap.put('e', 7.3828125f);
        characterWidthMap.put('f', 4.2246094f);
        characterWidthMap.put('g', 7.6171875f);
        characterWidthMap.put('h', 7.6054688f);
        characterWidthMap.put('i', 3.3339844f);
        characterWidthMap.put('j', 3.3339844f);
        characterWidthMap.put('k', 6.9492188f);
        characterWidthMap.put('l', 3.3339844f);
        characterWidthMap.put('m', 11.689453f);
        characterWidthMap.put('n', 7.6054688f);
        characterWidthMap.put('o', 7.341797f);
        characterWidthMap.put('p', 7.6171875f);
        characterWidthMap.put('q', 7.6171875f);
        characterWidthMap.put('r', 4.9335938f);
        characterWidthMap.put('s', 6.251953f);
        characterWidthMap.put('t', 4.705078f);
        characterWidthMap.put('u', 7.6054688f);
        characterWidthMap.put('v', 7.1015625f);
        characterWidthMap.put('w', 9.814453f);
        characterWidthMap.put('x', 7.1015625f);
        characterWidthMap.put('y', 7.1015625f);
        characterWidthMap.put('z', 6.298828f);
        characterWidthMap.put('A', 8.208984f);
        characterWidthMap.put('B', 8.232422f);
        characterWidthMap.put('C', 8.378906f);
        characterWidthMap.put('D', 9.240234f);
        characterWidthMap.put('E', 7.5820312f);
        characterWidthMap.put('F', 6.9023438f);
        characterWidthMap.put('G', 9.298828f);
        characterWidthMap.put('H', 9.0234375f);
        characterWidthMap.put('I', 3.5390625f);
        characterWidthMap.put('J', 3.5390625f);
        characterWidthMap.put('K', 7.8691406f);
        characterWidthMap.put('L', 6.685547f);
        characterWidthMap.put('M', 10.353516f);
        characterWidthMap.put('N', 8.9765625f);
        characterWidthMap.put('O', 9.4453125f);
        characterWidthMap.put('P', 7.236328f);
        characterWidthMap.put('Q', 9.4453125f);
        characterWidthMap.put('R', 8.337891f);
        characterWidthMap.put('S', 7.6171875f);
        characterWidthMap.put('T', 7.330078f);
        characterWidthMap.put('U', 8.783203f);
        characterWidthMap.put('V', 8.208984f);
        characterWidthMap.put('W', 11.865234f);
        characterWidthMap.put('X', 8.220703f);
        characterWidthMap.put('Y', 7.330078f);
        characterWidthMap.put('Z', 8.220703f);
        characterWidthMap.put('0', 7.6347656f);
        characterWidthMap.put('1', 7.6347656f);
        characterWidthMap.put('2', 7.6347656f);
        characterWidthMap.put('3', 7.6347656f);
        characterWidthMap.put('4', 7.6347656f);
        characterWidthMap.put('5', 7.6347656f);
        characterWidthMap.put('6', 7.6347656f);
        characterWidthMap.put('7', 7.6347656f);
        characterWidthMap.put('8', 7.6347656f);
        characterWidthMap.put('9', 7.6347656f);
        characterWidthMap.put('_', 6.0f);
        characterWidthMap.put('-', 4.330078f);
        characterWidthMap.put('.', 3.8144531f);
        characterWidthMap.put(',', 3.8144531f);
        characterWidthMap.put(';', 4.0429688f);
        characterWidthMap.put(':', 4.0429688f);
        characterWidthMap.put('#', 10.0546875f);
        characterWidthMap.put('\'', 3.2988281f);
        characterWidthMap.put('"', 5.5195312f);
        characterWidthMap.put('+', 10.0546875f);
        characterWidthMap.put('*', 6.0f);
        characterWidthMap.put('~', 10.0546875f);
        characterWidthMap.put('`', 6.0f);
        characterWidthMap.put('?', 6.3691406f);
        characterWidthMap.put('\\', 4.0429688f);
        characterWidthMap.put('=', 10.0546875f);
        characterWidthMap.put('(', 4.6816406f);
        characterWidthMap.put(')', 4.6816406f);
        characterWidthMap.put('[', 4.6816406f);
        characterWidthMap.put(']', 4.6816406f);
        characterWidthMap.put('{', 7.6347656f);
        characterWidthMap.put('}', 7.6347656f);
        characterWidthMap.put('/', 4.0429688f);
        characterWidthMap.put('&', 9.357422f);
        characterWidthMap.put('%', 11.402344f);
        characterWidthMap.put('$', 7.6347656f);
        characterWidthMap.put('!', 4.810547f);
        characterWidthMap.put('@', 12.0f);
    }
}

Here is how to test and use these classes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class Main
{
    /** Test main */
    public static void main(String[] args) {
        final String text = 
                "By default when you write a long text using System.println() it is printed on a single line. "+
                "We need to calculate how many words will fit on a single line and then write the text to the document. "+
                "Here comes a single newline:"+
                "\n"+
                "Here comes a long text without spaces: "+
                "Loremipsumdolorsitametconsecteturadipiscingelitseddoeiusmodtemporincididuntutlaboreetdoloremagnaaliqua. "+
                "\n"+
                "Several newlines are preserved:"+
                "\n"+
                "\n"+
                "Redundant        spaces         get removed,       this is called normalizing.";
        
        for (String line : new TextColumn(new PortableFont(), 20).splitToLines(text))
        	System.out.println(line);
    }
}

To get this translated to JavaScript, copy & paste the TextColumn and PortableFont classes to the JSweet online translator, both into the left-side input textarea. Then click "Transpile" and get JavaScript from right-side result textarea. Here is JavaScript code to apply the result:

const format = function() {
    const text = inputTextarea.value;
    const width = parseInt(formatWidth.value);
    outputTextarea.value = "";
    
    const lines = new TextColumn(new PortableFont(), width).splitToLines(text);
    var formattedLines = "";
    for (var i = 0; i < lines.length; i++) {
        formattedLines += lines[i] + "\n";
    }
    outputTextarea.value = formattedLines;
};

const inputTextarea = document.getElementById("inputTextarea");
const outputTextarea = document.getElementById("outputTextarea");
const formatButton = document.getElementById("formatButton");
const formatWidth = document.getElementById("formatWidth");

formatButton.addEventListener("click", format);
formatWidth.addEventListener("change", format);

Here is an alternative implementation for TextColumn.Font that depends on Java AWT ("Abstract Windowing Toolkit"):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import java.awt.Font;
import java.awt.font.FontRenderContext;
import java.awt.geom.AffineTransform;

public class AwtFont implements TextColumn.Font
{
    private final Font font;
    private final FontRenderContext renderContext = new FontRenderContext(new AffineTransform(), true, true);
    
    public AwtFont(String fontId, int pointSize) {
        this.font = new Font(fontId, Font.PLAIN, pointSize);
    }
    
    @Override
    public float stringWidth(String text) {
        return (float) font.getStringBounds(text, renderContext).getWidth();
    }
    
    @Override
    public float widthCorrectionForLongWords()    {
        return 0f;
    }

}

You could also write that for PDFBox fonts, so TextColumn is a quite reusable class.

Resume

I am using this utility to generate text columns for videos. Such a column is supposed to scroll from bottom to top over the video image. You can create such an effect with ffmpeg, see my according article. It is nice to have the text formatter in the browser, running it as Java application is somewhat tiresome.