LaTeX to HTML5 at scale
February 4, 2022 3:21 AM   Subscribe

The ar5iv.org project makes articles from arXiv.org accessible as responsive HTML5 web pages. Sample paper: A Simple Proof of the Quadratic Formula (1910.06709)

Twitter thread announcing the project, explaining features and design choices. Most of the arXiv papers can be accessed by replacing the X with 5 in the url:
  • from: https://arxiv.org/abs/1910.06709

  • to: https://ar5iv.org/abs/1910.06709
  • posted by kmt (18 comments total) 32 users marked this as a favorite
     
    I approve. Give it an A but not A+. Don't like the choice of hover vs click pop-up, annoyed that references get highlighted but don't link back to place that referenced them. Those sort of reference things should go both directions and be consistent.
    posted by zengargoyle at 4:14 AM on February 4, 2022 [1 favorite]


    Yah, super cool. I wish the developers best of luck, because I'd love to see this integrated into arXiv.
    posted by Alex404 at 4:22 AM on February 4, 2022


    Innnnnnnnnnnteresting. Wish I had time to dive into the technical details.

    --former journal typesetter and SGML/XML monkey
    posted by humbug at 6:31 AM on February 4, 2022 [2 favorites]


    Good luck with all those scanned PDFs of papers from the mid-90's & earlier :)

    EDIT: Disregard, apparently can't read this early in the morning. I saw "LaTeX to HTML5" and somehow thought "Arbitrary PDFs to HTML5", which is a much different problem.
    posted by genpfault at 6:32 AM on February 4, 2022 [3 favorites]


    I’m totally spoiled by the output of MathJax, which is superior to the MathML output here. I assume this was a considered decision based on converting the most documents.
    posted by fantabulous timewaster at 7:19 AM on February 4, 2022 [4 favorites]


    This cool. (And doesn't need user tweeking in the TeX source, which is great!) As someone who rarely writes things without lots of big figures and weird tables, the results for my own stuff are pretty flawed; but, that's a very hard problem to solve. Neat!
    posted by eotvos at 8:09 AM on February 4, 2022


    The quadratic formula paper is actually really interesting!
    posted by mpark at 8:59 AM on February 4, 2022 [5 favorites]


    the quadratic formula paper is making me feel called out for singing the pop-goes-the-weasel song when I have to remember it.

    (I also sing it to my toddler because I want her to learn early that her dad is a tremendous dork)
    posted by dismas at 9:42 AM on February 4, 2022 [2 favorites]


    I need some additional context here. My understanding is the ar5iv project is turning pdfs into html5 webpages...but why? Is this an accessibility project? Is this a general movement away from pdfs?
    posted by ockmockbock at 10:13 AM on February 4, 2022 [1 favorite]


    I need some additional context here.

    arXiv accepts papers in a typesetting language called LaTeX and papers that are only PDFs. LaTeX can be compiled into PDFs. The ar5iv project is displays html5 built from the LaTeX of an arXiv paper. The author of ar5iv says this is for his interest in ML but it will improve accessibility and readability. arXiv will continue to have the source and PDFs or the foreseeable future. The PDF papers don't seem to be in ar5iv ex https://ar5iv.org/html/cs/0004016
    posted by bdc34 at 10:36 AM on February 4, 2022 [2 favorites]


    Is this an accessibility project?

    Ha, in a way. Serious math formating on the web is not great. Most any site with math notation with the simplest sub or superscripts let alone complex integrals are a pistache of small images, somewhat defeating the purpose of HTML.

    To quote a famous (oh nevermind) "Math is hard". And much harder if the tricky equation is all messed up or displayed with a non-zoomable image.
    posted by sammyo at 4:58 PM on February 4, 2022


    The technology is kind of cool, but it already fails for some of the fairly simple notation that I use in my articles. The symbols by themselves look good enough (with some exceptions), but the relative sizes are all over the place, which makes things really hard to read.

    Much less cool is that this completely ignores the licenses under which the articles are published on arXiv. Moreover, it only takes the first uploaded version, not the most recent one. So, while the original arXiv URL of one of my articles takes you to the most recent version that also has additional information, the ar5iv URL leads to an outdated version that looks kind of shitty.

    Considering that I did not grant permission for any of that, I am a bit miffed. It also doesn't help that the creator of ar5iv does not really seem to understand why some might consider this a problem.
    posted by erdferkel at 6:28 PM on February 4, 2022 [4 favorites]


    I still have ockmockbock's question, though. The LaTeX renders the math great into PDF. So if the goal is to have the math look right... we already have that, right?
    posted by secretseasons at 2:49 PM on February 5, 2022


    Regarding the why – I think the (or at least a) main argument is that HTML works better with small screens and ebook readers, as it doesn't assume a fixed line width. But I prefer PDF for my stuff anyway, so what do I know.
    posted by erdferkel at 4:37 PM on February 5, 2022


    I, for one, hate how everything is being subsumed into HTML. TeX is as good as it is because it was designed from the start for typesetting and many, many years were spent making it work for that purpose and eliminating the bugs.

    That said, even getting 80% of the way there with the designed by committee half ass shit that is HTML is an impressive technical achievement. It may ultimately be a toy, but lots of toys have a lot of good work behind them. Making even passably decent molds for plastic injection molding is some black magic shit. The really good ones are total sorcery.
    posted by wierdo at 7:10 PM on February 5, 2022 [1 favorite]


    MathJax fonts.... you don't have good fonts, the ones that come with your computer and such suck badly. Raise the flag and find yourself some good fonts and learn how to make them work.

    I read MathJax papers/post regularly and it looks fine. On the not gonna happen wagon I'm on the pony request for MJ support here on Metafilter not really sure though for screen reader users as to whether or not if comes out understandable or gibberish.

    Your fonts suck!
    posted by zengargoyle at 3:08 AM on February 6, 2022


    Is there a way to browse for arxiv articles that are more than a few days old?
    posted by neuron at 11:13 AM on February 6, 2022


    Today, latexml encounters over 10,000 unknown packages and >140,000 *distinct* unknown macros during the conversion.

    In other words, doing pretty well by LaTeX standards.
    posted by credulous at 7:23 PM on February 6, 2022


    « Older 1890: Caster's big toes become opposable like...   |   Thank you. Yes. Newer »


    This thread has been archived and is closed to new comments