Image descriptions in the Fediverse
I have learned a lot about describing images according to Mastodon's standards, and I want to share my knowledge, but I haven't learned enough
It must have been two years ago that I've learned about the importance of describing images in the Fediverse.
Now, I'm not someone who's easily satisfied with the absolute bare minimum. If I have to do it, I want to do it right. I want to do it the best I can. "Better than nothing" isn't good enough. In fact, this already holds true for the alt-text police and the Mastodon HOA. And if I have to describe my images, I want to be way ahead of them. I don't want my image descriptions to suddenly be sub-standard because Mastodon has kept raising its standards, but I haven't.
So I've spent these past years educating myself about alt-text and image descriptions and researching about what Mastodon users require, what Mastodon users want, what Mastodon users don't want. "Mastodon users" because, seriously, Mastodon is pretty much the only place in the Fediverse where image descriptions matter. Or used to be until two months ago when people who were on Mastodon and Instagram suddenly started escaping from Meta Platforms, Inc. and flocking to Pixelfed and brought Mastodon's accessibility rules with them. But that was only two months ago.
Until then, just about nobody outside Mastodon knew or cared about image accessibility. But if your content has a chance of ending up on some Mastodon timeline, it's pretty much mandatory.
For my research, I've had lots of sources of information. Various hashtags on, sometimes also on Mastodon instances targetted at disabled users. A whole number of webpages and blog articles about alt-text and image descriptions, even though they're mostly geared towards static commercial websites or HTML-formatted blogs. These webpages and articles keep contradicting what's happening on Mastodon, and what Mastodon users tend to love, but then again, they also contradict each other, e.g. mentioning a person's race vs never mentioning a person's race because that's racist.
Still, I've learned a lot.
Early on during my research, I've learned that Mastodon users love alt-text as sources of additional information on the topic of the image. What really got me to re-think my way of describing images was this toot by @Stormgren:
This mattered. A lot. Because I didn't want to post real-life cat photos.
What I wanted to post images about, and what I actually had already started posting images about, was 3-D virtual worlds. Super-obscure 3-D virtual worlds based on OpenSimulator. Something that only maybe one in 200,000 Fediverse users knows anything about. Everyone else, so I figured, would require tons of explanation to be able to understand my image posts.
Something else I took away from this is that it's better to give people all information they may not have but need to understand your image post on a silver platter right away than to expect them to look up even only some of this information themselves. This goes doubly when you know that they won't even find that information.
And in fact, I also learned that neurodivergent people may require more extensive explanations than neurotypical people. I've actually had a neurodivergent Mastodon user thank me for an absolutely monstrous image description with an absolute info dump of explanations in it.
That early already, I also learned other things. For example, the rule that alt-text must not exceed 200 characters (or even only 125 characters) does not exist on Mastodon. Instead, many Mastodon users love long, extensive, detailed image descriptions. Well, if they want them, they shall have them.
Another thing was that accessibility means that blind or visually-impaired users must have the very same chances to experience an image as sighted users. There's no arguing that, I guess.
Again, my images are about 3-D virtual worlds. A kind of 3-D virtual worlds that have been referred to by the buzzword "metaverse" or "metaverses" as early as 2007, 14 years before Zuckerberg used that word, and I can prove that. What my images show may be referred to as "metaverse". Not an artistic impression of a metaverse, not an AI rendering of a metaverse, but actually existing, living, breathing 3-D virtual worlds that are being referred to as "metaverses", or whose network is being referred to as "the metaverse".
In short: The metaverse exists. And my images show it. They show the actually, really existing metaverse.
Chances are that this has sighted users on the edges of their seats in excitement. What do they do then? Only look at what matters in the image within the context of the post? Of course not! Instead, they go exploring this exciting, recently discovered, whole new universe by taking in all the big and small details in the image.
Now allow me to re-iterate: Accessibility means that blind or visually-impaired users must have the very same chances to experience an image as sighted users. Anything else equals ableism.
In this context, it means that blind or visually-impaired users must have the very same chances to take in all the big and small details in my images just the same as sighted users. But they can't see them. So I have to sit down and describe all the details in the image to them. And explain them, of course, if they don't understand them.
This was when my image descriptions really grew to titanic sizes.
Also, if there is text in an image, it must be transcribed verbatim. My understanding of that is that any and all text that is anywhere within the borders of an image must be transcribed absolutely identically to the original. Now, I'm not talking about what's called flattened copy. I'm talking about signs or posters or logos or box art or the like strewn about the image.
This rule does not cover a number of edge-cases, though. For example, it does not cover text which is unreadable in the image as it is posted, but which whoever posts the image can source and thereby transcribe verbatim nonetheless. I figured that if no exception is explicitly made, then there is no exception for such text. If it can be transcribed, it must be transcribed. So my first long image description ended up with 22 individual text transcripts of various lengths.
The location where an image was taken, so I learned, should be mentioned, too, unless very good reasons speak against it. None of these reasons apply to my images from virtual worlds. Cue me not only mentioning where an image is from, but explaining it in more and more characters so that everyone understands it with no prior special knowledge required.
Now the question was where to put all that information. Into the post itself (which would inflate it to ridiculous lengths)? Into the alt-text like everyone on Mastodon (for which it would be too long at several thousand characters)? Into a reply (which would be inconvenient and stay entirely unnoticed by Mastodon users)?
I actually did a test run in the shape of (content warning: eye contact, alcohol) a thread with four times the same post, but different ways of describing the image in it. I cross-posted it to Lemmy to have people vote on which is the best place to describe an image. The poll wasn't really representative although describing the image in the post itself technically won: It only got five votes.
Then I got into an argument with @Deborah, a user with a physical disability that makes it impossible for her to access alt-text. Money quote from way down this comment thread:
Her point was clear: Information only available in alt-text, but neither in the post text body nor in the image itself, is inaccessible and therefore lost to all those who cannot access alt-text. And not everyone can access alt-text.
From elsewhere, I learned that externally linked information is inconvenient and potentially inaccessible. Conclusion: It's generally best to provide all information necessary for understanding a post in the post itself.
Okay, so when I describe and explain my images at the level of detail that I deem necessary (and that level is sky high), the description, complete with included explanations, must go into the post text body.
But then there was the alt-text police as a department of the Mastodon HOA. At least some of them demand every image in the Fediverse have a useful (as in sufficiently detailed and accurate) alt-text. Yes, even if there's already an image description in the post. That is, if they can't see the image description in the post right off the bat because the post is hidden behind a summary and CW, then of course that image description doesn't count.
When I realised that, I started describing all my original images twice. Once with a long and detailed image description in the post itself. Once with a shorter, but still extensive image description in the alt-text. That said, I often had to cut text transcripts because multiple dozen text transcripts wouldn't all fit into a maximum of 1,500 characters, especially not including the descriptions necessary for people to even find them.
Even though Hubzilla doesn't really have a character limit for alt-text, I have to limit myself because I've long since learned that Mastodon cuts alt-texts from outside off at the 1,500-character mark if they're longer than 1,500 characters. I was told that Misskey does the same. And I figured that all their respective forks do that, too. Also, (content warning: eye contact, alcohol) even Hubzilla can only display so many characters of alt-text.
By the way: I've yet to see anyone on Mastodon sanction someone for an alt-text that's too long or too detailed. As long as I don't, I'll suppose it doesn't happen. I'll suppose that Mastodon is perfectly happy with 1,500-character alt-texts.
As for my long descriptions, they've started humongous already. The first one was already most likely the longest image description in the Fediverse. It started out at over 11,000 characters that took me three hours. More research, one edit and another round about two hours later, it stood at (content warning: eye contact, alcohol) over 13,000 characters. Then came (content warning: eye contact, food, tobacco, weapons, elevated point of view) over 37,000 characters for one image. Then came (content warning: eye contact, food) over 40,000 characters for one image. Then came over 60,000 characters for one image which took me two whole days, morning to evening. And I even consider that image obsolete and insufficient nowadays.
My image descriptions have grown so long that they have headlines, often on multiple levels.
I barely get any feedback for these image descriptions, but it doesn't look like I get more criticism than praise.
Still, my learning process continued.
I learned that it's actually good to have both an alt-text and a long image description.
I learned that "picture of", "image of" and "photo of" are very bad style. The photograph, more specifically the digital photograph, can be considered a default nowadays. All other media, however, must be mentioned. So if I have a shaded, but not ray-traced digital 3-D rendering, I have to say so.
I learned that people may want to know about the camera position (its height above the ground in particular) and orientation. And so I mention both if there are enough references in the image to justify them. (For example, it probably isn't worth mentioning that the camera is oriented a few degrees south of west if the background of the image is plain white and absolutely featureless otherwise.)
I learned that technical terms and jargon which not everyone may be familiar with must be avoided if anyhow possible and explained if not. Since I can't constantly write around any and all terms specific to virtual worlds in general and OpenSim in particular in everyday words, this alone added thousands upon thousands of characters of explanations to my long image descriptions.
I learned that abbreviations of any kind must be avoided like the plague if anyhow possible. At the very least, they must be spelled out in full and then associated with their own abbreviation at first. Then, initialisms that are spelled letter by letter must have their latters separated with full stops whereas acronyms that are pronounced like words must not have these full stops.
For example, the proper way to use "OAR" is by first spelling it out: "OpenSimulator Archive," followed by the initialism in parentheses with full stops between the letters, "(O.A.R.)", then explaining what an OAR is without requiring any prior knowledge except for what has already been explained in the image description. Later on, the initialism "O.A.R." may be used unless it is so far down the image description that it has to be spelled out again to remind people of what it means.
I learned that not only the sizes of objects in the image belong into the image description, but they must be explained using references to either what else is in the image or to what people are easily familiar with like the size of body parts. I only have one image post that actually takes care of this.
I learned that not only colours belong into the image description, but they must be described using a small handful of basic colours plus brightness plus saturation. After all, what does a blind person know what sepia or fuchsia or olive green or Prussian blue or Burgundy red is?
I learned that, when describing a person or anything in the image akin to a person (avatar, non-player character, animesh figure, static figure etc.), their gender must never be mentioned unless either the gender is clearly demonstrated, or it has been verified, or it is clearly and definitely known otherwise. I do mention the gender of my own avatar because I've created him, and I've defined him as male. I also mention @Juno Rowland's gender because I've created her, too, and I've defined her as female.
Similarly, I learned that, when describing a person (etc. etc.), their race or ethnicity must never be mentioned although some sources say otherwise. Rather, the skin tone must be mentioned, more specifically, one out of five (dark, medium-dark, medium, medium-light, light; I may expand this to nine with another four tones in-between).
Beyond that, I learned that the following may belong into the description of a person:
I prefer portraits nowadays, especially with a background that's as minimalist as possible. It's enough of an effort to describe the avatar; it'd go completely out of hand if I also had to describe the entire surrounding.
Similarly, I still avoid having realistic-looking buildings in my images. And the last non-realistic building required up over 40,000 characters of description alone. Granted, it's both gigantic and highly complex, not to mention that it mostly has glass panes for walls so that much of its inside is visible. But if there was a realistic-looking building in one of my images, I'd first have to spend days researching English architectural terms, and then I'd have to explain all these terms for the laypeople who will actually come across the image.
Style-wise, I learned that alt-text must not contain line breaks. Hubzilla and (streams) themselves showed me that using the quotation marks on your keyboard in alt-text is a bad idea, too. I've never done the former, and I've stopped doing the latter.
Other things of which I know that they don't belong into alt-text are hashtags, hyperlinks (both embedded links and plain URLs), emoji, other Unicode characters which screen readers won't necessarily parse as letters, digits or interpunction, image credits and license information (the latter two must be in plain sight if they are required).
I learned that screen readers may or may not misinterpret all-caps. It's actually better to transcribe text in all-caps without the all-caps and mention in the image description that the original text is in all-caps.
I also learned recently that, in fact, extremely long image descriptions are not necessarily bad, not even in social media. Fortunately, I don't have to deal with a character limit for my posts. Only two limits matter: 1,500 characters for alt-text because Mastodon cuts off everything that goes beyond. And the 100,000 characters of post length above which Mastodon probably rejects posts altogether, rendering the image description efforts that has inflated the posts beyond these sizes moot. And yes, I can post over 100,000 characters on Hubzilla.
Whenever I learned something new, I declared all my image descriptions in which I hadn't implemented it yet obsolete.
But I still don't know enough.
I dare say I have learned a whole lot. But it's all more or less basic concepts. What I still don't know enough about is what the general guidelines are when it comes to applying these concepts to such extremely obscure edge-cases as my virtual world images.
What I'm doing is a first. People have posted virtual world images in the Fediverse before, even on Mastodon. It happens all the time. A few have also added basic alt-text. But I'm the first to actually put some thought into how this has to be done if it shall be done properly.
I've still got a lot of unanswered questions. And truth be told, if one person tries to answer them, they're still unanswered. I don't need one answer from one person. I need a general community consensus for an answer.
When I ask a question on how to do a certain thing when describing my virtual world images, I don't want one person to answer. I don't want one person to answer, another person to answer the exact opposite and these two persons not knowing about each other either. But this is Mastodon's standard modus operandi because people generally can't see who has replied what to any given post before or after them.
I want to ask that question, and then I want one or several dozen people to discuss that question. Not only with me, but even more with each other. Mastodon semi-veterans who live and breathe Mastodon's accessibility culture, non-Mastodon Fediverse veterans who can wrap their minds around having no character limit, accessibility experts, actually blind or visually-impaired people, neurodivergent people who need the kind of info dumps that I provide. Plus myself as the only one of the bunch who knows a thing about these virtual worlds.
Alas, this is impossible in the Fediverse. Mastodon is too limited and too much "microblogging" and "social media" for it. And while the Fediverse does have places that are much better for discussions, Mastodon users don't congregate there, and those who do populate these places know nothing about accessibility or Mastodon's culture.
It doesn't help that I rarely post images, and when I do, I rarely get any feedback. The reasons why I rarely post images are because describing them has become such a huge effort, and many motives that I'd like to show are too complex to realistically be described appropriately.
So I have to replace a whole lot of detail knowledge with assumptions based on what I know, what I've experienced, what I can deduce from all this and what appears logical to me.
In fact, a lot of what I do in my image descriptions is based on the idea that if I mention something in an image, and a blind or visually-impaired person doesn't know what it looks like, chances are that they want to know it, and that they expect to be told what it looks like. No matter what it might be. However, it's my assumption that this may actually extend to just about everything in an image.
Still, I think I have amassed a whole lot of knowledge about alt-text in particular and image descriptions in general.
Now I'd really like to share this knowledge with others. For one, I want to give them a chance to have a very big edge over the ever-increasing requirements for good enough alt-text. Besides, I actually keep seeing people making the same glaring mistakes over and over and over again.
On top of all that, what few image description guides there are that touch the Fediverse only cover Mastodon. There are none that disregard post character limits, or at least that don't take triple-digit character limits as a given. This is the only guide for long image descriptions in social media/social networks that deals with long image descriptions at all. But even that guide doesn't take into account the possibility of being able to post tens of thousands, hundreds of thousands, millions of characters at once. Being able to describe one image in over 60,000 characters and then drop these over 60,000 characters all into the same post as the image itself. Not needing the extra capacity of alt-text for information that doesn't fit into the post itself anymore.
Most other image description guides are only for static websites and/or blogs. However, not only does most of the Fediverse not have any HTML, and not only do SEO keywords make no sense in Fediverse alt-text, but Mastodon's alt-text culture which dominates the whole Fediverse is vastly different from what accessibility experts and Web designers have cooked up for static websites and blogs. On a website, an alt-text of 300 characters is way too long. On Mastodon, it may actually be too short.
So, after studying various alt-text guides as well as Mastodon's alt-text culture, I felt the need to write down what I know so that others can learn from it.
For a while, I have toyed with the idea of starting yet another wiki on this Hubzilla channel of mine. This would be my first wiki about the Fediverse after two wikis about OpenSim, one of which is still very incomplete. The downside might be that it'd be hard to find unless I keep pointing individual people to it.
Then, a few months ago, I discovered a draft for an article on image descriptions in the Join the Fediverse Wiki. So I started expanding it last week with what I know. But as it seems, most of the information I've added isn't even welcome in the wiki. This was probably meant to be a rather simple alt-text guide.
Now I may actually create that wiki on Hubzilla. What I want to write won't fit onto one single page anyway, and I need some more structure.
I'm also wondering what to do with the knowledge I've gathered about content warnings, including a massive list of things that people may be warned about.
