Selection Isn't A Box, And Text Goes In All The Directions
October 7, 2019 6:08 AM   Subscribe

Text Rendering Hates You. Rendering text, how hard could it be? As it turns out, incredibly hard! To my knowledge, literally no system renders text "perfectly". It's all best-effort, although some efforts are more important than others... The overarching theme here will be: there are no consistent right answers, everything is way more important than you think, and everything affects everything else.
posted by Wolfdog (35 comments total) 27 users marked this as a favorite
 
I mean, who really cares if "æ" is written as "ae"?

Doesn't everyone? They're completely different!
posted by grumpybear69 at 6:19 AM on October 7, 2019 [14 favorites]


Yeah, æ is literally its own letter in some Scandinavian languages. It's not just ae written funny.
posted by Dysk at 6:27 AM on October 7, 2019 [1 favorite]


That's literally the point the article is making about ae and other ligatures.
posted by signal at 6:44 AM on October 7, 2019 [7 favorites]


I click the link about text rendering (and presumably the larger typographical issues involved), and the first thing I see, like a punch in the mouth, is a nested list that has both bullets and numbers. BOTH. Hgrhrhrhhhrhghghghrhr...
posted by fatbird at 6:57 AM on October 7, 2019 [5 favorites]


That's literally the point the article is making about ae and other ligatures.

It makes that point about other languages, but leaves æ - which is no more a ligature than w - as an example of "who cares?" as contrasted with languages that are basically all ligatures.
posted by Dysk at 7:06 AM on October 7, 2019


No, it's set up as ironic/counterfactual, the next sentence starts with "Well, as it turns out…" as in, you should care.
posted by signal at 7:18 AM on October 7, 2019 [9 favorites]


Snark aside, that's actually a really good overview of the text-rendering pipeline and its tortured interdependencies.

Just this morning I was reading the judgement against Trump in NY State, and noticed that the document is a weird pastiche of old and new. It's in Courier, like a typewriter, but it's got justified margins. The footnote numerals are properly superscripted. Underlining is used for emphasis despite italics being available, but some headers are bolded. The box around the plaintiff's name and case is drawn with hyphens for horizontal lines, colons for vertical lines, and capital Xs for corners. The quotes are smart. Typographically, it's Frankenstein's monster, and it's easy to see how it came to be that way: the old, purely typewriter driven format was ported directly to a word processor that was allowed to add its bells and whistles.

Reread that document and notice how many of the items it highlights come entirely from the low resolution of monitors that were the standard for a long time.
posted by fatbird at 7:22 AM on October 7, 2019 [17 favorites]


I had the same reaction as signal initially, but after thinking about it I've come around to the opposite side - 'æ' just happens to be a spectacularly poor example of a ligature, because in some languages that otherwise don't use ligatures, it's considered a full letter. It's in their alphabet and everything. You could, very easily, treat text as individual characters so long as you include æ, ø and å as characters, when it's being used as an example of why you can't.
posted by Merus at 7:47 AM on October 7, 2019 [4 favorites]


" like a punch in the mouth, is a nested list that has both bullets and numbers."

I don't understand what your problem is with this. That list is intuitive and makes sense, didn't have to give it a second thought until you mentioned it like it was some kind of sin.
posted by GoblinHoney at 7:57 AM on October 7, 2019 [3 favorites]


I've always been tickled by the story of the initial development of TeX, a computerized typesetting system. It was originally developed when Donald Knuth, a computer scientist, received galleys of a (revision of a) book he had written. He was so disappointed with the typesetting that he spent his next sabbatical (and many years afterward) developing a satisfactory typesetting system that is now a worldwide standard.
posted by ElKevbo at 8:27 AM on October 7, 2019 [11 favorites]


now, can we try Esperanto?
posted by philip-random at 8:56 AM on October 7, 2019 [1 favorite]


Yeah, æ is literally its own letter in some Scandinavian languages…

I'd just like to point out that using 'literally' to make comments about letters is one of the few literal uses of literal in its original literal sense, and that most people who have a problem with using literally figuratively are using literally metaphorically.
Like, literally.
posted by signal at 9:11 AM on October 7, 2019 [22 favorites]


I don't understand what your problem is with this.

I could go into great theoretical detail, but basically it's redundant and contradictory to have both. Bullets are implicitly unordered--you should be able to reorder the points with no change to meaning, which isn't the case here.
posted by fatbird at 9:15 AM on October 7, 2019 [4 favorites]


They specifically contextualize the bit about ae as being in English. In English, ae is not a letter, it is two letters that should be rendered with a ligature when they are next to each other. It should not have to be typed as æ because æ is a single character. But whether you use ae or æ, in either case, it seems to be fairly conventional and readable to English speakers, so we tend not to be fussy about that. Other languages display everything contextually, not just a few things in some modes of typesetting, and that's why ligature handling is hard but necessary.
posted by Sequence at 9:18 AM on October 7, 2019 [1 favorite]


Although I'm not a programmer, I love this kind of thing. I've been gradually working my way through The Raster Tragedy, which goes into the problems of glyph rendering in more detail.
posted by adamrice at 9:32 AM on October 7, 2019 [5 favorites]


> I've always been tickled by the story of the initial development of TeX

Many scientists - I think in the physical sciences without exception - are regular users of TeX (well, LaTeX, usually) for journal article submissions. For scientific text rendering, TeX is still unparalleled. The justification of paragraphs - as someone used to TeX, I can't bear to look at what Microsoft Word thinks is acceptable for justified text.

But TeX also has its weird and annoying quirks and bizarre rendering issues - for example, our university letterhead template has a wide left margin on page 1, to leave a blank column below the logo, but regular margins from page 2 and on. Implementing that in the LaTeX template turns out to be essentially impossible, because TeX renders all the text in a beautiful single justified scroll with fixed margins before putting in page breaks. (Yes, I implemented a hack.)

(But then, what about footnotes? Yes, you can get into weird cycles with footnote markers on the last line getting pushed to the next page by the footnote text - which then needs to move to the next page, and allows the footnote marker to move back to the previous page, which ...)

Anyway, text rendering. Like dealing with time - best not to look too hard at the details.
posted by RedOrGreen at 9:49 AM on October 7, 2019 [4 favorites]


Cursive Script: Any script where glyphs touch and flow into each other (like Arabic).
I was talking to a 90-year-old aunt this weekend, from a generation with beautiful cursive English handwriting, and this sentence from the article makes me think that font rendering is finally at a point where we could, in fact, support proper cursive English in WYSIWYG text editing rather than just fonts which use hacks to approximate handwriting.
posted by clawsoon at 10:06 AM on October 7, 2019 [2 favorites]


We probably could support cursive, but there's no real need to. Cursive English was for speed in writing; it's a drop in legibility compared to a proper serif font, especially at smaller text sizes.

I'd still love to see it done right, though.
posted by fatbird at 10:18 AM on October 7, 2019 [2 favorites]


Yeah, it'd be more about the beauty than the practicality.
posted by clawsoon at 10:21 AM on October 7, 2019


There are a lot of cursive Latin fonts but there's no Unicode encoding for cursive Latin characters. Unicode has selected script letters from the Latin alphabet (ɡℋℐℒℓ℘ℛℯℰℱℳℴ) and two Mathematical Script letter sets (regular and bold version of U&lc sets) but neither are intended for text. The latter can't be repurposed because they contain only 26 letters, a-z without elaboration (for example, there is no Mathematical Script ü) and they don't form ligatures (𝒷𝓊𝒻𝒻𝓁𝓎, for example).

So Unicode Latin script would have to start with a proposal from someone to elaborate the standard western European alphabets with script versions. My rough count says that's over 700 different glyphs across the breadth of Latin alphabets. Have fun!
posted by ardgedee at 10:45 AM on October 7, 2019 [2 favorites]


Bullets are implicitly unordered--you should be able to reorder the points with no change to meaning, which isn't the case here.

In HTML, ol and ul are the elements for lists; the first is ordered and the second is unordered. This article goes a step further, writes everything in that index using ul and then physically writes the numbers out. This is a bad practice.
posted by axiom at 10:49 AM on October 7, 2019 [9 favorites]


Have fun!

Someone should mention it to that dev at Mozilla who apparently goes super deep into ligature handling. "Yeah, bet you can't fix it for English cursive..."

I suspect the reason the author did the belt and suspenders thing with the bullets is that it's not easy to do cumulative numbering in ordered lists. If I were in their position I'd have faked the list with indentation.
posted by fatbird at 10:54 AM on October 7, 2019 [1 favorite]


> Someone should mention it to that dev at Mozilla who apparently goes super deep into ligature handling. "Yeah, bet you can't fix it for English cursive..."

I would never recommend underestimating the dedication of an otaku with an axe to grind.
posted by ardgedee at 11:04 AM on October 7, 2019 [2 favorites]


Underestimate it? I'm counting on it!
posted by fatbird at 11:07 AM on October 7, 2019 [3 favorites]


In case you want to go really deep into the sub-pixel rendering portion, it's always worth reading through The Raster Tragedy at Low Resolution. Extremely informative!
posted by BlackLeotardFront at 11:41 AM on October 7, 2019 [3 favorites]


The key performance bottle-neck in browser speed is text measuring. Not even drawing, measuring.
This was also the cause of most app speed disparities back in the golden age of Mac vs Windows (Windows was faster at measuring, possibly because the typographical system was more basic).
posted by w0mbat at 12:40 PM on October 7, 2019 [3 favorites]


Aw c'mon. Next you'll be saying that rendering a megathread page correctly with 4000+ comments (which would be twenty feet long if printed out) takes actual minutes of time on a mobile device. Crazy talk!
posted by benzenedream at 12:43 PM on October 7, 2019 [1 favorite]


So the megathreads would've been faster with a fixed-width font option?
posted by clawsoon at 12:45 PM on October 7, 2019 [2 favorites]


A fixed width of 0 would've been favorite.
posted by Wolfdog at 1:08 PM on October 7, 2019 [2 favorites]


> So the megathreads would've been faster with a fixed-width font option?

Probably not on a graphical display, since the browser still has to calculate font-size. But on a columnar text terminal display (eg, Lynx browser), rendering would be cheap.

Usually what speeds up renders of large sheets of text the most is using native fonts rather than imported fonts. Most of MacOS and Windows' core serif and sans-serif fonts are optimized for fast on-screen rendering, including hinting information that web fonts frequently leave out since they inflate file sizes and download overhead.
posted by ardgedee at 3:05 PM on October 7, 2019 [1 favorite]


ardgedee, why in the world would we want separate code points for cursive characters?
posted by thedward at 6:48 PM on October 7, 2019 [1 favorite]


Somehow this supports accessibility of digital text, right? There are plenty of people who need enlarged text, as well as text-to-speech functionality. In all languages. Print disability does not discriminate. It can be brought on by neurological trauma, infection from mosquitoes/ticks etc., age, genetics and more. The ae matters for people who depend on technology to parse language and communication. Designing and coding fonts and text for people of all abilities is important.
posted by childofTethys at 7:08 PM on October 7, 2019


Selection of text in the browser is some kind of circle of hell, and an admission of our failure as a civilization. I don't mean selection inside your mixed French / Arabic with embedded numerals, which is tricky on its face (but which browsers do quite well actually!) but just selection of text within an HTML carceral structure.
posted by away for regrooving at 10:26 PM on October 7, 2019 [1 favorite]


Selection of plain text usually isn't bad. The problems come when you're asking the OS to perform hard AI and distinguish: between illustrations and background or decorative elements; between headlines and pull quotes; between old-timey tables used to lay separate material out, versus tables used to display columnar text; etc. And don't get me started on questions like "how should the OS handle footnotes and endnotes", especially when the designer just used little numbers to identify notes rather than wrapping them in markup language. There are answers to all of these, but they represent decisions made by programmers, not the user just selecting text and the OS magically intuiting the right thing to do.
posted by Joe in Australia at 12:05 AM on October 8, 2019


Nice article, appropriately exasperated. The next time people rant at me that AI will replace us soon I will point them at this article. On the other hand, maybe the Basilisk will keep us on to help it render bits of text.
posted by dmh at 1:03 AM on October 8, 2019 [2 favorites]


« Older On the other hand, auditing the rich is hard.   |   ITMFA Newer »


This thread has been archived and is closed to new comments