Understanding HTML with Large Language Models

eliot · August 6, 2023, 3:50pm

An interesting article I read recently, somewhat tangentially related to L&L.

Understanding HTML with Large Language Models (PDF)

Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. Yet, their capabilities for HTML understanding – i.e., parsing the raw HTML of a webpage, with applications to automation of web-based tasks, crawling, and browser-assisted retrieval – have not been fully explored.

We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages.

…Out of the LLMs we evaluate, we show evidence that T5-based models are ideal due to their bidirectional encoder-decoder architecture. To promote further research on LLMs for HTML understanding, we create and open-source a large-scale HTML dataset distilled and auto-labeled from CommonCrawl.

This is more about HTML reading comprehension. For our interests, I suppose we would want the opposite, for it to generate HTML templates from human language input.

In general, I imagine it will become common to “write code” and develop websites by having a chat with an AI assistant and describing the result we want. But I wonder if such a conversational interface, by text or voice, is any faster or more intuitive than typing code on a keyboard, or “writing code” by GUI, through visual interface and interaction.

It could be that the best way is to integrate them into a multi-modal interface, so people who are more verbal thinkers can build a website by talking to it; visual thinkers can use mouse, touch, pen, to draw websites into existence; and programmers can write them as good ol’ code.

Speaking of GUI, I learned about structure/projectional editors, which sounded like a suitable design for a visual builder for L&L templates.

A structure editor is any document editor that is cognizant of the document’s underlying structure.

Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.

https://en.wikipedia.org/wiki/Structure_editor

The Gutenberg block editor fits this description, but it’s too high level of an abstraction for L&L templates - we need more granularity and detail, to be able to edit every tag and attribute.

A projectional editor allows the user to edit the abstract syntax tree (AST) representation of code in an efficient way.

It can mimic the behavior of a textual editor for textual notations, a diagram editor for graphical languages, a tabular editor for editing tables and so on. The user interacts with the code through intuitive on-screen visuals which they can even switch between for multiple displays of the same code.

https://www.jetbrains.com/mps/concepts/#projection-editor

There’s a direct mapping between the textual and visual representations of code, and these editor modes provide different views into the same data structure.

we create and open-source a large-scale HTML dataset distilled and auto-labeled from CommonCrawl

Curious about this dataset mentioned in the article. Maybe we can use it to stress test and optimize L&L’s HTML parser. (CommonCrawl is an open repository of web crawl data.) Couldn’t find any links in the paper, but I did find discussion about it on GitHub - will keep an eye on it.

Review research on the state-of-the-art for self-hosted HTML document processing

benjamin · August 7, 2023, 3:31pm

There are WP page builders with GUIs for adding individual HTML tags which has always seemed a little excessive to me. I’d imagine that if you have the technical skills and want full control over the HTML output, you’d be better off just writing out the tag manually in the first place since typing seems faster than navigating a GUI with a ton of options. I think the reason people do that is that they want the additional builder options like inserting dynamic data and handling conditional logic. But now L&L allows you to do that more quickly and with more control. The big difference I suppose is the ability to preview the output and the ability to include predefined snippets and suggestions to speed up repetitive tasks. Sounds like something we’ll be able to solve once we’re using CM6.

Regarding the use of AI, I’m really curious to see what ZipWP has in store. It seems to me that an AI can only be maximally useful if you can speak the language it outputs. For example, if there was an LLM specialized in designing rocket ships, I could get it to design one for me, but I wouldn’t know how to evaluate whether the rocket design was suitable for my needs because I don’t “speak the language” (I’m not an aerospace engineer). I feel like we’re seeing a similar thing happen in the WordPress space. We’ve got LLMs that are creating plugins for people with no development experience, but they can’t evaluate whether that plugin is well-built or fully meets their needs because they don’t speak the language. I see L&L being incredibly useful there because its language is so much closer to plain English that even if someone isn’t fluent in the language, they’ll probably be able to skim it, understand how it works, and maybe make minor modifications to it. I’ve spent a bit of time on the ACF forum and I’ve seen people who have really unique needs and know exactly how and where they want to use their data. But they don’t speak the language that allows them to fully express themselves through WordPress. I can imagine a world where AI+L&L solves that for a lot of people.

Rips · August 8, 2023, 6:00am

I saw a post from a dev saying that LLMs will end up changing how coding languages are written because they will be optimised for the LLM, but who knows exactly where things are heading right now?

I was wondering if ChatGPT and similar apps might make L&L less relevant because it will be easier to code in PHP, but perhaps L&L is already somewhat optimised for LLMs because it’s more human-readable.

benjamin · August 8, 2023, 1:00pm

Could be. There are certainly some contexts where you need something generic and simple to explain and an LLM could spit that out for you. But I think there are also contexts where the English language isn’t the most effective way of describing how a program or site should work and an intermediary language like L&L can provide a kind of “interface” to explain the idea. Since it’s a language, it’s more flexible than a literal graphical interface but much easier to learn than an actual programming or templating language like JS or PHP. As Eliot mentioned, there’s certainly room for improvement from a UI perspective to give people a better way to interact with the language than just a code box.

The way I see it, an LLM-generated plugin or snippet written in PHP is kind of a black box for most people. Sure, you could try it out and see if it does what you thought it should do, but most people won’t be able to understand how it’s working so they wouldn’t be able to improve/modify it without being able to explain exactly what needs to be changed to the LLM in plain English. I feel like the end result of interfacing with an LLM to build your site would be similar to working with shoddy devs; the resulting PHP/plugin/site might achieve your stated needs, but it might not meet your implied business needs or edge cases and will be difficult to scale as long as you can’t see and understand what’s going on behind the scenes.

I’d assume this would be the case since it’s more opinionated and there are therefore fewer ways to achieve a given functionality compared to PHP. Would be interesting to hear Eliot’s take on that.

Rips · August 8, 2023, 2:18pm

Yes, well, ultimately, I think LLMs have been a little over-hyped due to the issue with them making stuff up, which I don’t think can be easily solved and means they can’t be trusted for many business use cases.

A projectional editor sounds useful and might make L&L even easier for newbies to pick up.

Tangibleinc · August 8, 2023, 10:52pm

I think the big advantage of L&L + LLMs vs PHP + LLMs is that L&L is an abstraction layer that allows you to use really predictable syntax to achieve a certain outcome regardless of the underlying software. For instance in the Pro version (accessible via the Tangible Blocks beta) we have an integration with both LifterLMS and LearnDash as well as both WooCommerce & Easy Digital Downloads. To write PHP to achieve a certain outcome in terms of displaying stuff from 2 different LMS or shopping cart plugins, your LLM has to have a solid understanding of the unique structure/hooks/filters of each plugin, but if it understands L&L it can use the same approach to work with any of them without needing specialized contextual knowledge. Beyond that LLMs tend to hallucinate a lot and make all kinds of mistakes that a newbie programmer cannot catch or correct whereas the chances of some L&L syntax it spits out bricking your site is near zero and because a layperson can read L&L code relatively easily they can correct/modify the output of the LLM as needed.

Tangibleinc · August 8, 2023, 11:02pm

@benjamin I think ZipWP is just using a predefined Gutenberg-based template/theme (they swap out which one depending on the niche you select) and then they have really well tailored prompts to get just the right amount and kind of copy to fill out each section of each page of the template. It’s basically the same output you’d get if you imported an Astra ready site template and then just asked chatGPT for copy for each section of the templated site. It’s really cool and useful to have pre-personalized templates but it’s not groundbreaking in the sense that it’s not actually coming up with a unique design or site structure outside of swapping your logo and some colors out. I’m sure they have plans to go beyond that as the tech evolves and what they did is cool and useful if you just want a standard 5-page brochure site so I’m not dissing them at all. I’m sure it took a lot of work to get all the prompts just right to dial in the copy for each section and getting it all to generate in a minute or so like they did must also have been a solid challenge, it’s just not gonna help our audience of people with more unique dynamic sites short of some major upcoming breakthrough.

Rips · August 9, 2023, 3:49pm

All this generative AI stuff is great if you are not quite sure what you want and are happy to take more or less whatever it gives you. Maybe you can refine it to a point with time but then it becomes a question of doing it yourself vs. paying someone else anyway.

I’m not yet convinced it’s going to disrupt creative or technology industries as quickly as people might have feared.