Utilizing a JBIG2 Encoder with No Information Loss
The JBIG2 specifications caution against using halftoning since this operation can seriously degrade the image. Similarly, the specs caution against using lossy JBIG2, which can introduce mismatches that degrade document quality, readability, and recognition rates.
The best way to understand such cautions with respect to JBIG2 is that the quality of the JBIG2 encoder is crucial. After all, image thresholding is potentially much more degrading than either font learning or halftoning since the part of the image that needs to be retained, e.g., signature, may disappear entirely. Yet most corporations capture their documents to black and white because they trust that the thresholding function is reliable, and that essential information in the image document will be preserved. They probably also test this assumption before putting their document imaging system into production.
It is important to understand that effective JBIG2 compression, with compression rates 5x-10x smaller than TIFF G4 (or standard G4-basedPDF), cannot be achieved with lossless JBIG2. Moreover, once a lossy JBIG2 encoder is utilized, it becomes essential for the IT manager or project leader responsible for document integrity to ensure that the JBIG2 converter supports perceptually lossless conversion.
TODO: CHANGE COMPRESSION ALGORITHM
I think the point is there may be a time in the future when it's not just compress-then-decompress but some sort of guessing algorithm that actually tries to make your decompressed voice sound clearer by basically making highly statistically supported guesses at what words you were trying to say. And so someone who, for all intents and purposes IS you, is then saying words that you never said because a computer was in the middle of the transaction, "helping."
"[it's] just conflating small connected regions whose differences fall below a threshold. Which is the core algorithm of lossy JBIG2.... I love it: The wikipedia page for JBIG2 is already updated with this issue."
"When you consider this particular issue stems from something close to a 8px x 16px glyph and the difference between correct and incorrect glyph is roughly 4px it isn't so hard to see how the lossy algorithm gets confused. Still, definitely a warning to know the tools you're using."
The reader raised the quality from “normal” to a higher setting, which – counter-intuitively – reduced the readability of the scanned document, however, reduced the number of mangled numbers drastically (maybe even to zero).
Yep, if I had any Xerox stock right now I wouldn't for long.
« Older The NSA is handing the Justice Department informat... | Amazing French cartoonist Boul... Newer »
This thread has been archived and is closed to new comments
Buy a Shirt