March 19, 2021

The Quality of Well-Structuredness

In brief:

The goal of structured content isn’t more structure; it’s to put handles on bits of information that live inside of other content so software can do useful stuff with that information on demand.

What do we call the quality that makes it easier for software to do that? Parsability touches on how easily data can be teased out by algorithms, but isn’t quite right. Accessibility feels close, but collides with the broader concept of web accessibility. Availability might work, but it’s important to focus on goals and audience rather than technical criteria. “Who is the data accessible to, and for what purpose?” is a better question to ask than “How accessible is it?”

Discussion transcript

Jeff Eaton:

OK, friends, I have a question for you two.

Ethan Marcotte:

chinhands.gif

Karen McGrane:

Yessssss?

Jeff Eaton:

I’ve been chewing what to call the quality of the stuff inside of content being parse-able and discoverable in a meaningful way. Like, what separates a blob from a chunk isn’t just structure because in theory a JPEG is structured data just as much as an HTML file, which is structured just as much as (say) a database. But stuff that is organized and exposed by that structure isn’t what you need from it.

So it’s not… accessible?

Karen McGrane:

Ummmm

Jeff Eaton:

I’ve been thinking that “Accessibility” could be a good word for that, and I worried it collided with the generally understood “web accessibility” stuff but

Karen McGrane:

I think the collision with the commonly understood definition of accessibility makes that a non-starter.

Ethan Marcotte:

(back, i was drafting a shitpost tweet) (about accessibility, weirdly enough)

Jeff Eaton:

I was starting to think that is a related to that kind of accessibility, because a big part of that is making sure critical information for the person is available and extractable, not just inferrable via squinting at the JPEG or knowing that a particular pile of markup is meant to be a street address.

Karen McGrane:

I feel like semantics is also a critically related idea.

“Available” does seem close?

Ethan Marcotte:

Yeah, I was going to offer semantics, if that’s not too neckbeard a term

Ooh, available’s good. I also don’t dislike accessible here, fwiw.

Karen McGrane:

Like, available versus inferrable is a good thing to explain. Also… obtainable?

Ethan Marcotte:

“Accessible to [something]” maybe, rather than “accessibility” as a concept — it feels okayish to me, but depends on the audience, really.

Jeff Eaton:

Yeah, that’s what had me circling around to the connection to between web accessibility initially — explaining the difference between a picture of a sign, the text of the sign, markup that makes explicit that the text is a warning, and so on.

I think you’re right, Karen, that the “overloading” of the term feels iffy, at the very least in need of clarification.

Karen McGrane:

I feel like the risk of overloading “accessibility” with this context is that “web accessibility” is entirely about the interaction being accessible to a person, whereas in this context it is more likely to mean that the semantic structure can be parsed by a computer.

Jeff Eaton:

I… am going to be pedantic and pose a question about that, because I have to get it out of my head.

Karen McGrane:

Eaton, you are among friends.

Jeff Eaton:

In the context of web accessibility, isn’t it all about whether the underlying meaning can be parsed by a computer effectively? And then formatted in a way that works best for a given person?

Admittedly, the part where it falls down — and this is where I think the distinction you point out really DOES matter — is that visual display choices affect a person’s ability to take in the information regardless of the computer’s ability to parse it. Like, color choice and use of animation is significant for accessibility but is unrelated to this semantic parsing of meaning thing.

Karen McGrane:

Hmmmm. I guess we have to define what parsing is…

Jeff Eaton:

Hah! This is exactly the rabbit whole I went down; the very first pass, I was using the word ‘parseability.’

Karen McGrane:

I will jump down that hole with you!

Jeff Eaton:

Like: Amazon displays product dimensions if the metadata is available, and if a product has a proper image, it also displays the product visually scaled next to a person. But the picture alone wouldn’t be an “accessible” way to store product dimensions, for either people or computers. And descriptive text that happens to contain the words “40mm x 120mm” would be… more accessible than a picture of the box but less accessible than explicit metadata.

Karen McGrane:

Okay, merriam-webster.com:

Parse: to understand something by considering it parts closely. Synonyms: analyze, dissect, audit… Antonyms: skim, miss, overlook… Find the right word.
Listen to Our Podcast about “parse!”

Jeff Eaton:

In the context of web accessibility, isn’t it all about whether the underlying meaning can be parsed by a computer effectively? And then formatted in a way that works best for a given person.

Ethan Marcotte:

I think I disagree, but also you started a paradox-grade fight inside of my brain.

Karen McGrane:

Now for Wikipedia:

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech). The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence or word, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Jeff Eaton:

Yeah, when “parsing” comes into play you can call a LOT of really terrible things “parse-able” given a sufficiently complex parser.

Karen McGrane:

If we use an easy example of something like an address or a date, if it’s just text the computer can’t do anything with it. If it has parsable semantics then you can make a map or a calendar or whatever.

Jeff Eaton:

In theory it can spot patterns — like, OSX has baked in “recognizers” for phone numbers and addresses in text that make them clickable even without semantic tagging. But they’re flaky and full of false positives, and miss things if they’re formatted in odd ways.

Karen McGrane:

And that sometimes goes wrong!

Jeff Eaton:

In theory, if you throw machine learning at certain problems you can “extract the important stuff” but it’s monstrously inefficient compared to just (say) entering the date explicitly.

Ethan Marcotte:

There’s a technical component here, like structurally/semantically-correct markup makes things parseable to or understandable by the accessibility object model, which then can be interfaced with by something like a screen reader or a braille interface. But there’s also the folks who want to tab through an interface and have some visual indication of where they are.

Like, color choice and use of animation is significant for accessibility but is unrelated to this semantic parsing of meaning thing. So I guess, coming back to this, it’s about who’s doing the parsing?

Jeff Eaton:

Right, right. When we put information in a JPEG we are saying: “A person can parse this, using the parts of their brain that turn pictures into knowledge”

Karen McGrane:

I still cling to my definition that accessibility is explicitly aimed at people and defined by whether people can use it, even though semantic parsing by computers may be an aspect of the process.

Jeff Eaton:

Hmmmmm. Yeah I think that distinction makes sense, even if only to avoid “human-accessibility” being dilluted as a meaningful description.

Karen McGrane:

Whereas the idea of, let’s call it “availability” means that semantic parsing is possible and success is defined entirely by whether the computer can do it, even if the end goal is to make it possible for a person to do a thing.

Jeff Eaton:

So, in the context of (oh let’s say) content modeling, the process of “dechunking” is fundamentally about making meaning more available.

Karen McGrane:

Right, the goal isn’t chunks per se, the goal is to make “handles” that computers can grab in order to do something with the content.

Jeff Eaton:

And sufficiently smart computers with sufficiently consistent information might be perfectly fine doing the heavy lifting automatically, but that’s often error prone and inexact. Like using OCR on faxes as a substitute for sending the original document.

Karen McGrane:

Right. Most website problems are too small to merit an expensive, complex AI solution, unless you are operating at the scale of Google or Amazon, and even then it’s often wrong.

Jeff Eaton:

And! Even Google puts immense energy into convincing the world to use consistent semantic markup, via Schema.org and other initiatives.

Karen McGrane:

I’m glad we had this talk.

Jeff Eaton:

I am too. I’m still… so so on availability but I’m glad I ran “accessibility” by both of you, because I think the thing I’m talking about is definitely narrower, or at least a very restricted context.

Karen McGrane:

Yeah, I don’t love availability either.

Jeff Eaton:

Karen, the comment you made about “putting handles on the things you need to grab” is an excellent angle, because it’s clear that “handles, more handles, everything should be covered in handles” isn’t the point.

And Ethan, it was back a bit but I think “accessible… to what? to who?” is another key. Accessibility less as a binary, more as a suite of questions to ask yourself.

Ethan Marcotte:

Yes! I apologize for the self-link, but teasing digital accessibility out into “navigable” and “usable” was really very helpful to me, and I blogged about it a while ago: Accessibility is not a feature.

Jeff Eaton:

Never apologize fo the self-link. That navigable/usable distinction is really useful, thank you!

Ethan Marcotte:

Ah, thank you!