Things that AI can't do when describing an image, but I can; CW: long (almost 3,300 characters), alt-text meta, image description meta
Artikel ansehen
Zusammenfassung ansehen
So you think AI is always better at describing and even explaining
any image out there than
any human, including insiders and experts on the very topic shown in the image? 100% accurately? And at a higher level of detail than said humans?
Well,
here's an image.
I'd like to see AI identify the place shown in the image as the central crossing at BlackWhite Castle and identify BlackWhite Castle as a standard-region-sized sim on Pangea Grid, a virtual world or so-called "grid" based on OpenSimulator.
I'd like to see AI explain the above, all the way down to a level that can easily be understood by someone who has only got a rough idea about what virtual worlds are.
I'd like to see AI correctly mention the pop-cultural links from this sim to German Edgar Wallace films and Frühstyxradio that are obvious to me.
I'd like to see AI correctly identify the avatar in the middle. By name. And know that identifying the avatar is appropriate in this context.
I'd like to see AI know and tell the real reason why the avatar is only shown from behind.
I'd like to see AI recognise that the image was not edited into monochrome, but it's actually both the avatar and the entire sim with everything on and around it that's monochrome.
I'd like to see AI transcribe text that's unreadable in the image. 100% accurately verbatim, letter by letter.
I'd like to see AI identify the object to the right and explain its origin, its purpose and its functionality in detail.
I'd like to see AI discover and mention the castle in the background.
I'd like to see AI accurately figure out whether it's necessary to explain any of the above to the expected audience and, if correctly deemed necessary, do so. And explain the explanations if correctly deemed necessary.
I'd like to see AI know and correctly mention which direction the camera is facing.
Finally, I'd like to see AI automatically generate
two image descriptions, a full and detailed one with all explanations and a shorter one that can fit into 1,500 characters minus the number of characters necessary to mention the full description and explain its location.
When I posted the same image, I did all of the above. And more.
In fact, if AI is supposed to be better than me, I expect it to identify
all trees in the image, not only the mountain pines, to give an even more detailed description of the motel advertisement and to give a much more detailed description of the map, including verbatim transcripts of all text on it and accurate information on what place is shown on the map in the first place.
If AI is supposed to be better than me, I expect it to
- describe, explain and transcribe everything that I describe, explain and transcribe
- describe, explain and transcribe even more on top of that
- even more accurately than I do
- more whimiscally
- and in much fewer characters.
All, by the way, fully automatically with no human intervention except for maybe a simple prompt to describe the image for a certain Fediverse project.
#
Long #
LongPost #
CWLong #
CWLongPost #
OpenSim #
OpenSimulator #
Metaverse #
VirtualWorlds #
AltText #
AltTextMeta #
CWAltTextMeta #
ImageDescription #
ImageDescriptions #
ImageDescriptionMeta #
CWImageDescriptionMeta #
A11y #
Accessibility #
AI #
AIVsHuman #
HumanVsAI