This article is about an experimental formatter that lets you shrink some unformatted text into a "newspaper" column. It provides word-wrap, that means on the right side of the column there may occur empty spaces.
The problem with word-wrap is the width calculation for a word.
Every letter may have a different width,
and every font may can have different values for the proportions of its letters.
For instance, Monospaced
is a special font where all letters have the same width,
so 'i' and 'm' will take the same horizontal space, but this is the exception and not the rule,
most fonts will have different widths for characters.
Thus you can format a column only through knowledge of the used font.
Here is the formatter. Enter some text in the left (or top) textarea, then click the "Format" button, the resulting newspaper column should be rendered in the right (or bottom) textarea. Use the number field to change the column width.
I created this page from Java source code, using https://www.jsweet.org/jsweet-live-sandbox for translation to JavaScript.
Algorithm in Java
This works by passing a text and a font to a class named TextColumn
and getting back the formatted text column.
The font must implement the inner interface TextColumn.Font
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | import java.util.ArrayList; import java.util.List; /** * Formats a text to fit into a given column width. * Puts line-breaks also into long words (like URLs) * that would break the column, but generally tries * to not break words over lines (searches spaces). * Preserves newlines when they are more than one. */ public class TextColumn { /** * Users of this class must implement this interface * to calculate letter- and word-widths. */ public interface Font { float stringWidth(String text); float widthCorrectionForLongWords(); } private final Font font; private final float width; public TextColumn(Font font, int characterCount) { this(font, ((font.stringWidth("i") + font.stringWidth("m")) / 2) * characterCount); } public TextColumn(Font font, float deviceWidth) { this.font = font; this.width = deviceWidth; } public List<String> splitToLines(String text) { text = text.replaceAll("\t", " "); text = text.replaceAll("\r\n", "\n"); // replace WINDOWS newlines by simple ones text = replaceSingleNewlinesBySpace(text); text = text.replaceAll("[ ]{2,}", " "); // normalize spaces final List<String> lines = new ArrayList<String>(); for (String paragraph : text.split("\n")) // a paragraph is a chunk of text that has no newlines. for (String line : splitParagraphToFitWidth(paragraph)) lines.add(line); return lines; } private String replaceSingleNewlinesBySpace(String text) { final StringBuilder result = new StringBuilder(); for (int i = 0; i < text.length(); i++) { final char c = text.charAt(i); if (c == '\n' && i > 0 && text.charAt(i - 1) != '\n' && i < text.length() - 1 && text.charAt(i + 1) != '\n') result.append(' '); else result.append(c); } return result.toString(); } private List<String> splitParagraphToFitWidth(String text) { final List<String> lines = new ArrayList<String>(); if (text.length() <= 0) { // preserve newlines lines.add(text); // will serve as empty line return lines; } int previousSpaceIndex = -1; while (text.length() > 0) { final int nextSpaceIndex = nextSpaceIndex(text, previousSpaceIndex + 1); // search first space final String line = text.substring(0, nextSpaceIndex).trim(); final float lineWidth = font.stringWidth(line); if (lineWidth > width) { // must split at preceding space if any if (previousSpaceIndex < 0) // no preceding space, split by width previousSpaceIndex = nextSplitIndex(text); lines.add(text.substring(0, previousSpaceIndex)); text = text.substring(previousSpaceIndex).trim(); previousSpaceIndex = -1; } else if (nextSpaceIndex == text.length()) { // reached end of text lines.add(line); text = ""; } else { // search further until width is trespassed previousSpaceIndex = nextSpaceIndex; // remember latest space for split } } return lines; } private int nextSpaceIndex(String text, int searchStartIndex) { final int length = text.length(); for (int i = searchStartIndex; i < length; i++) if (Character.isWhitespace(text.charAt(i))) return i; return length; } private int nextSplitIndex(String text) { final float width = this.width - font.widthCorrectionForLongWords(); for (int i = 1; i < text.length(); i++) { final String line = text.substring(0, i).trim(); final float lineWidth = font.stringWidth(line); if (lineWidth >= width) return i; } return text.length(); } } |
Single newlines would be replaced by a space by this implementation, only multiple newlines are preserved.
A platform-independent "PortableFont" class implements TextColumn.Font
.
It carries hardcoded character widths,
taken from a Java AWT "Dialog" font of 12 points.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | import java.util.Hashtable; import java.util.Map; public class PortableFont implements TextColumn.Font { @Override public float stringWidth(String text) { float width = 0f; for (char c : text.toCharArray()) { final Float mappedWidth = characterWidthMap.get(c); width += (mappedWidth != null) ? mappedWidth : characterWidthMap.get('n'); } return width; } @Override public float widthCorrectionForLongWords() { return 0; } private static final Map<Character,Float> characterWidthMap = new Hashtable<>(99); static { characterWidthMap.put('a', 7.3535156f); characterWidthMap.put('b', 7.6171875f); characterWidthMap.put('c', 6.5976562f); characterWidthMap.put('d', 7.6171875f); characterWidthMap.put('e', 7.3828125f); characterWidthMap.put('f', 4.2246094f); characterWidthMap.put('g', 7.6171875f); characterWidthMap.put('h', 7.6054688f); characterWidthMap.put('i', 3.3339844f); characterWidthMap.put('j', 3.3339844f); characterWidthMap.put('k', 6.9492188f); characterWidthMap.put('l', 3.3339844f); characterWidthMap.put('m', 11.689453f); characterWidthMap.put('n', 7.6054688f); characterWidthMap.put('o', 7.341797f); characterWidthMap.put('p', 7.6171875f); characterWidthMap.put('q', 7.6171875f); characterWidthMap.put('r', 4.9335938f); characterWidthMap.put('s', 6.251953f); characterWidthMap.put('t', 4.705078f); characterWidthMap.put('u', 7.6054688f); characterWidthMap.put('v', 7.1015625f); characterWidthMap.put('w', 9.814453f); characterWidthMap.put('x', 7.1015625f); characterWidthMap.put('y', 7.1015625f); characterWidthMap.put('z', 6.298828f); characterWidthMap.put('A', 8.208984f); characterWidthMap.put('B', 8.232422f); characterWidthMap.put('C', 8.378906f); characterWidthMap.put('D', 9.240234f); characterWidthMap.put('E', 7.5820312f); characterWidthMap.put('F', 6.9023438f); characterWidthMap.put('G', 9.298828f); characterWidthMap.put('H', 9.0234375f); characterWidthMap.put('I', 3.5390625f); characterWidthMap.put('J', 3.5390625f); characterWidthMap.put('K', 7.8691406f); characterWidthMap.put('L', 6.685547f); characterWidthMap.put('M', 10.353516f); characterWidthMap.put('N', 8.9765625f); characterWidthMap.put('O', 9.4453125f); characterWidthMap.put('P', 7.236328f); characterWidthMap.put('Q', 9.4453125f); characterWidthMap.put('R', 8.337891f); characterWidthMap.put('S', 7.6171875f); characterWidthMap.put('T', 7.330078f); characterWidthMap.put('U', 8.783203f); characterWidthMap.put('V', 8.208984f); characterWidthMap.put('W', 11.865234f); characterWidthMap.put('X', 8.220703f); characterWidthMap.put('Y', 7.330078f); characterWidthMap.put('Z', 8.220703f); characterWidthMap.put('0', 7.6347656f); characterWidthMap.put('1', 7.6347656f); characterWidthMap.put('2', 7.6347656f); characterWidthMap.put('3', 7.6347656f); characterWidthMap.put('4', 7.6347656f); characterWidthMap.put('5', 7.6347656f); characterWidthMap.put('6', 7.6347656f); characterWidthMap.put('7', 7.6347656f); characterWidthMap.put('8', 7.6347656f); characterWidthMap.put('9', 7.6347656f); characterWidthMap.put('_', 6.0f); characterWidthMap.put('-', 4.330078f); characterWidthMap.put('.', 3.8144531f); characterWidthMap.put(',', 3.8144531f); characterWidthMap.put(';', 4.0429688f); characterWidthMap.put(':', 4.0429688f); characterWidthMap.put('#', 10.0546875f); characterWidthMap.put('\'', 3.2988281f); characterWidthMap.put('"', 5.5195312f); characterWidthMap.put('+', 10.0546875f); characterWidthMap.put('*', 6.0f); characterWidthMap.put('~', 10.0546875f); characterWidthMap.put('`', 6.0f); characterWidthMap.put('?', 6.3691406f); characterWidthMap.put('\\', 4.0429688f); characterWidthMap.put('=', 10.0546875f); characterWidthMap.put('(', 4.6816406f); characterWidthMap.put(')', 4.6816406f); characterWidthMap.put('[', 4.6816406f); characterWidthMap.put(']', 4.6816406f); characterWidthMap.put('{', 7.6347656f); characterWidthMap.put('}', 7.6347656f); characterWidthMap.put('/', 4.0429688f); characterWidthMap.put('&', 9.357422f); characterWidthMap.put('%', 11.402344f); characterWidthMap.put('$', 7.6347656f); characterWidthMap.put('!', 4.810547f); characterWidthMap.put('@', 12.0f); } } |
Here is how to test and use these classes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | public class Main { /** Test main */ public static void main(String[] args) { final String text = "By default when you write a long text using System.println() it is printed on a single line. "+ "We need to calculate how many words will fit on a single line and then write the text to the document. "+ "Here comes a single newline:"+ "\n"+ "Here comes a long text without spaces: "+ "Loremipsumdolorsitametconsecteturadipiscingelitseddoeiusmodtemporincididuntutlaboreetdoloremagnaaliqua. "+ "\n"+ "Several newlines are preserved:"+ "\n"+ "\n"+ "Redundant spaces get removed, this is called normalizing."; for (String line : new TextColumn(new PortableFont(), 20).splitToLines(text)) System.out.println(line); } } |
To get this translated to JavaScript, copy & paste the TextColumn
and PortableFont
classes to
the JSweet online translator,
both into the left-side input textarea. Then click "Transpile" and get JavaScript
from right-side result textarea.
Here is JavaScript code to apply the result:
const format = function() { const text = inputTextarea.value; const width = parseInt(formatWidth.value); outputTextarea.value = ""; const lines = new TextColumn(new PortableFont(), width).splitToLines(text); var formattedLines = ""; for (var i = 0; i < lines.length; i++) { formattedLines += lines[i] + "\n"; } outputTextarea.value = formattedLines; }; const inputTextarea = document.getElementById("inputTextarea"); const outputTextarea = document.getElementById("outputTextarea"); const formatButton = document.getElementById("formatButton"); const formatWidth = document.getElementById("formatWidth"); formatButton.addEventListener("click", format); formatWidth.addEventListener("change", format);
Here is an alternative implementation for TextColumn.Font
that depends on Java AWT ("Abstract Windowing Toolkit"):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import java.awt.Font; import java.awt.font.FontRenderContext; import java.awt.geom.AffineTransform; public class AwtFont implements TextColumn.Font { private final Font font; private final FontRenderContext renderContext = new FontRenderContext(new AffineTransform(), true, true); public AwtFont(String fontId, int pointSize) { this.font = new Font(fontId, Font.PLAIN, pointSize); } @Override public float stringWidth(String text) { return (float) font.getStringBounds(text, renderContext).getWidth(); } @Override public float widthCorrectionForLongWords() { return 0f; } } |
You could also write that for
PDFBox fonts,
so TextColumn
is a quite reusable class.
Resume
I am using this utility to generate text columns for videos. Such a column is supposed to scroll from bottom to top over the video image. You can create such an effect with ffmpeg, see my according article. It is nice to have the text formatter in the browser, running it as Java application is somewhat tiresome.