Netzgemeinde Hubzilla
Anmelden
Registrieren
Netzgemeinde Hubzilla
Anmelden
Registrieren
System Apps
Fehler melden
Hilfe
QRator
Sprache
Suche
Verzeichnis
Zufälliger Kanal
2025-02-22 11:02:35
Profil ansehen
Anna Maier
annam@nerdculture.de
Discussing using AI to generate alternative texts again. Does anyone have good examples for bad generation?
#Accessibility
#altText
Link zur Quelle
2025-02-22 15:12:13
Profil ansehen
Jupiter Rowland
jupiter_rowland@hub.netzgemeinde.eu
@
Anna Maier
I don't know what constitutes a "good" example in your opinion, but I've got two examples of how bad AI is at describing images with extremely obscure niche content, much less explaining them.
In both cases, I had the
Large Language and Vision Assistant
describe one of my images, always a rendering from within a 3-D virtual world. And then I compared it with a description of the same image of my own.
That said, I didn't compare the AI description with my short description in the alt-text. I went all the way and compared it with my long description in the post, tens of thousands of characters long, which includes extensive explanations of things that the average viewer is unlikely to be familiar with. This is what I consider the benchmark.
Also, I fed the image at the resolution at which I posted it, 800x533 pixels, to the AI. But I myself didn't describe the image by looking at the image. I described it by looking around in-world. If an AI can't zoom in indefinitely and look around obstacles, and it can't, it's actually a disadvantage on the side of the AI and not an unfair advantage on my side.
So without further ado,
exhibit A:
This post
contains
an image with an alt-text that I've written myself (1,064 characters, including only 382 characters of description and 681 characters of explanation where the long description can be found),
the image description that I had LLaVA generate for me (558 characters)
my own long and detailed description (25,271 characters)
The immediate follow-up comment dissects and reviews LLaVA's description and reveals where LLaVA was too vague, where LLaVA was outright wrong and what LLaVA didn't mention although it should have.
If you've got some more time,
exhibit B:
Technically, all this is in one thread. But for your convenience, I'll link to the individual messages.
Here is the start post
with
an image with precisely 1,500 characters of alt-text, including 1,402 characters of visual description and 997 characters mentioning the long description in the post, all written by myself
my own long and detailed image description (60,553 characters)
Here is the comment with the AI description
(1,120 characters; I've asked for a detailed description).
Here is the immediate follow-up comment with my review of the AI description.
#
Long
#
LongPost
#
CWLong
#
CWLongPost
#
AltText
#
AltTextMeta
#
CWAltTextMeta
#
ImageDescription
#
ImageDescriptions
#
ImageDescriptionMeta
#
CWImageDescriptionMeta
#
AI
#
LLaVA
#
AIVsHuman
#
HumanVsAI
1
Link zur Quelle
Konversationsmerkmale
Lädt...
Lädt...
Konversationsmerkmale
Lädt...
Lädt...
Anmelden
E-Mail oder Kennung
Kennwort
Angaben speichern
Anmelden
Zurücksetzen des Kennworts
Entfernte Authentifizierung
Registrieren