Blog-Archiv

Donnerstag, 16. Mai 2019

The SQL Count Query Trick

Whenever you think you know everything about SQL you will soon find out that you don't. That's what I experienced recently when learning how you can turn any query, of any complexity, into a count-query that gives you just the number of records the original query would deliver.

In real life you need a count-query for pagination, i.e. for finding out how many database records are available in total. Only by means of such a count you can calculate the number of pages that a user can select from. The problem arises when the query is a huge join that may return a different result when you simply replace the attributes inside the select clause by a count(*). Besides, you don't want to duplicate a big complex query into a count-sister, do you?

So how to do this? It is a trick. But it works.

→ You make your big complex query a sub-select, that means you enclose it in parentheses. Then you prepend a select count(*) from, and append an arbitrary alias name, like e.g. X:
select count(*) from (
  -- any complex query goes here
) X    -- the alias used to count

Assuming you have the following query ....

  select e.name, c.name 
  from employee e inner join company c on c.employee_id = e.id 
  where c.deleted is null and c.state_id in
    (select s.id from state s where s.reachable = 1)

.... and you want to know the total count of its possible result.
Then your count query looks like this (don't forget the trailing alias, it is needed):

select count(*) from (
  select e.name, c.name
  from employee e inner join company c on c.employee_id = e.id
  where c.deleted is null and c.state_id in
    (select s.id from state s where s.reachable = 1)
) X

This is now counting eggs instead of delivering records :-)




Sonntag, 5. Mai 2019

Writing PDF with Java PDFBox

The document-format PDF ("Portable Document Format") has been around for 26 years now. Developed by Adobe, an open format since 2008, based on the printer language PostScript. This is a page-oriented "move-to / line-to" language.

Apache PDFBox is an open-source Java project that makes it possible to read and write PDF-documents. In this Blog I want to sneak into the way how to write a PDF document programmatically from Java.

What You Need to Know About PDFBox

  • Its 0/0 coordinate lies on bottom-left of the page. The coordinate system is bound to just one page.

  • It doesn't do automatic page breaks.

  • If you want to draw some text, you need to move to the font's baseline, not to top left corner.

  • It doesn't support components (or document-elements) like paragraph, table, list etc.

  • There is no predefined layout.

  • There is no underscore text-attribute, you need to draw a line to get the underscore effect. There is no bold text-attribute, you need to change the font.

  • The PDPageContentStream writer is a statful object that requires calls in a certain order, else it will throw exceptions.

All of these things you need to implement when you want to use PDFBox. There are some github-projects that build on PDFBox and provide more, but they cover just pieces of the missing things.

Mind that you can also use the "Open HTML to PDF" library to generate PDF from HTML (its two required JAR files have about 1.5 MB). If you can facilitate this for your project, it may save you from the complexity of PDFBox.

Hello PDF World

Following source will create a PDF file like this:

Here is the Java code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class HelloWorld
{
    public static void main(String[] args) throws IOException {
        final PDDocument document = new PDDocument();
        
        final PDFont font = PDType1Font.HELVETICA;
        final float fontSize = 10;
        final float ascent = fontSize * font.getFontDescriptor().getAscent() / 1000;
        final float descent = fontSize * -font.getFontDescriptor().getDescent() / 1000;
        final float widthOfM = fontSize * font.getStringWidth("M") / 1000;
        System.out.println("Font size: "+fontSize+", ascent: "+ascent+", descent: "+descent+", width of 'M': "+widthOfM);
        
        final PDRectangle pageSize = PDRectangle.A4;
        final PDPage page = new PDPage(pageSize);
        document.addPage(page);
        
        final PDRectangle pageBox = page.getMediaBox();
        final float width = pageBox.getWidth();
        final float height = pageBox.getHeight();
        System.out.println("PDF page w="+Math.round(width)+"/h="+Math.round(height));
        
        final float lineHeight = (ascent + descent) * 1.5f; // standard line height is 150% of font height
        float startX = pageBox.getLowerLeftX();
        float startY = pageBox.getUpperRightY();
        
        final PDPageContentStream contentStream = new PDPageContentStream(document, page);
        contentStream.beginText();
        contentStream.setFont(font, fontSize);
        contentStream.newLineAtOffset(startX, startY - lineHeight + descent); // set position to baseline of text
        contentStream.setLeading(lineHeight);
        contentStream.showText("First Line");
        contentStream.newLine();
        contentStream.showText("Second Line");
        contentStream.endText();
        
        contentStream.close();
        
        final String fileName = "HelloWorld.pdf";
        document.save(fileName);
        document.close();
        System.out.println("Wrote file: "+fileName);
    }

}

Mind that the document must be saved before it gets closed!
Output of this program is:

Font size: 10.0, ascent: 7.18, descent: 2.07, width of 'M': 8.33
PDF page w=595/h=842
Wrote file: HelloWorld.pdf

What is "Leading"?

Some say this is the space between text-lines, i.e. what is below the lower end of a 'g' but still above the upper end of an 'G' that is directly below it. This is the older meaning of the word.

Some say this is the line height, i.e. what is below the lower end of an 'a' and the lower end of an 'a' that is directly below it, thus it includes line spacing. This is the modern meaning that software goes with.

Don't use this term, it is mis-leading :-)

Recipes

This is how you can get a font's geometry. Mind that with font.getStringWidth(string) you can measure all kinds of strings except those that contain newlines.


        final PDFont font = PDType1Font.HELVETICA;
        final float fontSize = 10;
        final float ascent = fontSize * font.getFontDescriptor().getAscent() / 1000f;
        final float descent = fontSize * -font.getFontDescriptor().getDescent() / 1000f;
        // standard line height is 150% of font height
        final float lineHeight = (ascent + descent) * 1.5f;  // also called "leading" sometimes

The descent is the part that hangs down from the text baseline for 'g', 'y', 'p', 'j', 'q'. Don't forget to subtract it after a text block, else these letters may be partially covered when being on last line.

Following is the way to get a PDPageContentStream to write text or draw lines:


        final PDDocument document = new PDDocument();
        final PDRectangle pageSize = PDRectangle.A4;
        final PDPage page = new PDPage(pageSize);
        document.addPage(page);
        final PDPageContentStream contentStream = new PDPageContentStream(document, page);

Here is how to find out the page geometry:


        final PDRectangle pageBox = page.getMediaBox();
        final float width = pageBox.getWidth();
        final float height = pageBox.getHeight();
        final float startX = pageBox.getLowerLeftX();
        final float startY = pageBox.getUpperRightY();

When you advance from startY at top of page towards bottom, you need to subtract the line height, not add it: nextY = currentY - lineHeigth!

Resume

Lots of work to do with PDFBox. Hard parts are pagination (text across multiple pages) and tables, and the combination of the two.