Florida2K – Scanner Setings

Scanning in Binary, Mono or Color ?

Here is the top right corner of a very common card type for us:

../../../_images/fig01.png

Since a normal card reader only cares about the holes going all the way through the card, people printed all sorts of stuff on punched cards, the company logo being a very popular, almost default choice.

I have scanned this card three different ways, as color (“RGB”), as monochrome (“Gray”) and as binary (“Black&White”) and this is how the three images look to Florida2K:

../../../_images/fig02.png

If we start at the top: Florida2K will process color images by using the brightest of the three colors, which means that a brightly colored logo ends up being white. Florida2K does not us the color information in any other way.

If the scanner is set to monochrome, as in the middle, the logo becomes whatever shade of gray you scanner decides.

If you set the scanner to binary, it will generally try to make sure that everything you can see present on the paper, will also be in the scanned image, so the red logo comes out full black.

There is no way we can tell reliably from the binary image if there is a punched hole in middle of the logo or not.

So all in all: Scan your cards in mono/gray or color. Use binary only if you absolutely have to, and if your cards do not have any printing on them at all.

Scanner Resolution

The image processing works better at higher resolution, but the processing time increases super-quadratically with the size of the scanned images. (This means that a 200 DPI image takes more than four times as long to process than a 100 DPI image.)

Most of the time 100 DPI works just fine, 150 DPI works a bit better, and 200 DPI is normally not an improvement over 150 DPI. Anything above 200 DPI is just a waste of computer time.

If you want to preserve the scanned images, for instance to record what is printed/written or scribbled on the cards, 200 DPI is a good choice, because often the ink-ribbon in the printer were half way dry, as you can see on the faint “00001” in the picture above.

Other Scanner Settings

We always scan in loss-less format to TIFF files, but as long as the compression is not too heavy-handed, JPEG compression will probably work as well.

Florida2K – Image Processing

The first thing Florida2K does is to find everything in the image which looks like a hole, and this is done with a two dimensional correlation against a rectangular window matching the hole size.

This amounts to program placing a hole-sized “window” centered on every single pixel in the image in turn, and calculates how dark the pixels under the window are, and assigns a single score, called “goodness” depending how much it looks like hole.

A perfect “goodness” is +0.5 and a the parameter goodness by default disqualifies any pixels with less goodness than +0.1 from being considered as the center of a valid hole.

Here is how the default window looks at 100 DPI:

../../../_images/fig04.png

The red rectangles indicate properly sized and spaced holes in a punched card and the shade of gray indicate how much weight is assigned to wach pixel.

As can be seen, the number of pixels in the “window” does not fit perfectly inside the hole and depending on the paper and the scanner, this may reduce the quality of the hole-detection.

There are two parameters slim_x and slim_y will reduce the size of the window by some number of pixels, and setting both of these to one results in a window which stays firmly inside the hole:

../../../_images/fig05.png

We also calculate the “opposite” window, to assign “badness” if all the pixels around the hole are dark, typically if something were printed on the card:

../../../_images/fig03.png

A typical “badness” is -0.3 and the parameter badness by default will disqualify any pixel with a badness above zero from being considered the center of a hole.

Be aware that the goodness and badness parameters are not alone, there are other and in many ways much more discriminating checks later on. The important thing to remember if you need to tune them, is that they should fail on the side of tolerance and never reject an actual hole.

There are two further parameters peak_x and peak_y which help make sure we only get one pixel nominated as center of a hole for each actual hole in the card, but these will only need tweaking if the holes in the card are severely out of alignment. (This is untested.)

phk