How an image description by LLaVA AI compares to an image description hand-written by myself; CW: long (almost 29,000 characters, including one long image description of over 25,000 characters), Fediverse meta, image description meta, image of monochrome motive
Artikel ansehen
Zusammenfassung ansehen
Okay, I've done it.
I've pitted an image-describing AI against myself and one of my own image descriptions.
As the test object, I've used my latest in-world picture which has first been published
here, including the exact same full-length image description and explanation from that post. The short description in the alt-text is the same, too, all shortcomings included, but I had to modify the alt-text itself for this specific use-case.
Users of Pleroma and its forks, Friendica, Hubzilla and (streams) can see the image here:
Users of Mastodon, Misskey and their respective forks can see it as a file attachment below this post.
My AI of choice was the
Large Language and Vision Assistant (LLaVA). I've fed it the image at the resolution at which I've also posted it, namely at a resolution of 800x533 pixels.
The AI took a few seconds to produce this 558-character image description:The image appears to be a 3D rendering or a screenshot from a video game or a virtual environment. It shows a character standing on a paved path with a brick-like texture. The character is facing away from the viewer, looking towards a sign or information board on the right side of the image. The environment is forested with tall trees and a dense canopy, suggesting a natural, possibly park-like setting. The lighting is subdued, with shadows cast by the trees, indicating either early morning or late afternoon. The overall atmosphere is calm and serene.
I'll talk about this description in more details in a follow-up.
Now, here comes
my description. I didn't write it while looking at the scaled-down image. I didn't write it while looking at the image at its original solution of 2100x1400 pixels either.
I wrote it while I was still in-world. I could walk around and zoom around and see everything from different angles and at much, much higher resolutions. For example, one of the picture on the advertising board to the right shows a building that's nearby but outside the borders of the image. I could describe that picture by
walking to that very building.
All in all,
I took eight hours to produce this 25,271-character image description:Image description
The picture in this post is a digital rendering from inside a 3-D virtual world based on OpenSimulator, generated in a regular client for this kind of virtual worlds, also known as a viewer, using shaders and generated shadows, but without ray-tracing. It shows my avatar on a paved path surrounded by conifer trees and with cliffs in the background. Everything in the picture is in monochrome like an old black-and-white film. However, the image is unaltered and shows both my avatar and the scene as they are in-world.
What OpenSimulator is
OpenSimulator is a free, open-source, cross-platform server-side re-implementation of the technology of Second Life. The latter is a commercial 3-D virtual world created by Philip Rosedale, also known as Philip Linden, of Linden Labs and launched in 2003. It is a so-called "pancake" virtual world which is accessed through desktop or laptop computers using standard 2-D screens rather than virtual reality headsets. Second Life had its heyday in 2007 and 2008. It is often believed to have shut down in late 2008 or early 2009 when the constant stream of news about it broke away, but in fact, it celebrated its 20th birthday in 2023, and it is still evolving.
OpenSimulator, OpenSim in short, was first published in January, 2007. Unlike Second Life, it is not one monolithic, centralised world. It is rather a server application for worlds or "grids" like Second Life which anyone could run on either rented Web space or at home, given a sufficiently powerful computer and a sufficiently fast and reliable land-line Internet connection. This makes OpenSim as decentralised as the Fediverse. The introduction of the Hypergrid in 2008 made it possible for avatars registered on one OpenSim grid to travel to most other OpenSim grids.
Second Life and the OpenSim-based worlds are called "grids" because they are flat worlds divided into square areas of 256 by 256 metres each which is roughly 280 by 280 yards. These areas are called "regions".
Where the picture was made
The picture displays a part of Black White Castle, a fairly recent sim in Pangea Grid. "Sim" is short for "simulator" which refers to what is running in a region so that something can be built in it, and avatars can enter it. In Second Life, a sim is always one region. In OpenSim, so-called varsims can span multiple regions, always in a square arrangement with the same number of regions in both directions. Up to 32x32 regions in one sim are possible. Black White Castle only covers one region.
Pangea Grid is a German OpenSim grid with a special focus on arts, architecture and landscaping.
The name "Black White Castle" is most likely borrowed from a section of the innuendo-saturated German comedy film Neues vom Wixxer from 2007, the sequel to Der Wixxer from 2004. Both films are parodies on the German black-and-white mystery thrillers Der Hexer from 1964 and its sequel Neues vom Hexer from 1965. These films, in turn, are part of a series commonly referred to as "Edgar Wallace films" as they're based on crime novels written by the British author Richard Horatio Edgar Wallace. These two films are based on the novel The Ringer from 1926, a revised version of a 1925 novel known by the titles The Gaunt Stranger and Police Work.
Neues vom Wixxer, while generally in colour, picks up black and white as what has grown into a style element in the classic Edgar Wallace films in a place named "Black-White Castle". As the name indicates, it is entirely black and white for reasons of tradition, everyone and everything inside it included.
The eponymous sim was built in small parts by making entirely new assets with monochrome textures, but mostly by taking existing objects, extracting their textures, exporting them from OpenSim, using an external image editor to reduce their saturation to zero, re-uploading them to OpenSim and replacing the original textures on the objects with their new monochrome versions. The basic ground texture was altered in the same way, and even the sky and the sunlight are devoid of colour. Likewise, it's common for visitors like me to try and make their own avatars entirely black and white.
The sim was built by Bink Draconia who had previously built a sim with the TV series The Good Place, started in 2016, as its theme.
My avatar
My avatar is standing in the middle of the image, the head right of centre by about two or three percent of the image's width due to most of the weight resting on the right foot, the feet a few percent above the bottom edge of the image, roughly centred on average and slightly apart. His back is turned towards the camera, and he is facing away from the camera, so his face is entirely invisible.
He is a male human with fair skin that was altered to light grey and short black hair. He is wearing a dark grey tweed suit with a very large herringbone pattern on the jacket and an even larger herringbone pattern on the trousers. Underneath the jacket, he is wearing a white button-down shirt, of which only a part of the collar above the collar of the jacket and the cuffs below the sleeves of the jacket are visible. In addition, he is wearing a black bowler hat and a pair of dark grey, slightly shiny formal dress shoes.
The ground
Beneath the avatar, there is a straight path with irregular edges that is about five metres or 17 feet wide and leads about 40 metres or 140 feet forward, ahead of and away from the avatar. Its texture shows pavement made of medium grey, rectangular concrete pavers, placed in alternating orientations in a 90-degree herringbone pattern, but rotated against the region's coordinate axes by 45 degrees and against the rough direction of the path by about 20 degrees to the right. The pavers are about twice as large as they would be in real life.
On both sides of the avatar, the pathway widens into a crossing, but the other three paths are beyond the edges of the image.
The ground on the sides of the paved path has a blurry light grey texture with a coarse resolution that is either a desaturated, very light grass texture or thin, dirty snow.
The scenery to the left
To the very left, there is a wooden arrow sign that is approximately rectangular except for the rough shape of the wood, including four notches on the left-hand side, and the extra corner protruding from the right-hand edge that points into the distance along the path. The bottom edge of the sign is at roughly the same height as the middle of my avatar's thighs, and the top edge is a little bit more than twice as high.
The sign has has "BlackWhite Castle" written on it in a Fraktur blackletter typeface, reminiscent of bright, shiny embossed metal with some dark shading surrounding it, but with a hard-to-identify texture on it. "BlackWhite" is written as one word, but in Pascal Case with the first letters of both "Black" and "White" as capitals. The writing is a bit less than a third of the height of the sign and about as long as its top and bottom edges. The medium-grey paint has partly come off again, especially near the top, but the writing is still intact. The sign shows the way to the building after which the whole sim is named. The sign is placed on top a lighter piece of wood with a rectangular cross-section that is a bit thinner than the sign itself and serves as its sign pole.
The sign is surrounded by three identical groups of eight bushels of high grass each, one to its left, one behind it, one to its right and partly in front of it. Most of the grass is less tall than the sign, but some of it, especially in the bushels behind the sign which have been enlarged, is taller. Also, in front of the sign, there is a group of six stone mushrooms at six different sizes which, given the colour-less setting, appear like actual rock. The two biggest ones have a diameter larger than that of my avatar's bowler hat.
There are three mountain pines to the left of the path which are identical, save for their size. The one the farthest away is about 12 metres or 40 feet tall. It is mostly obscured by another pine which is standing a little further to the left in the image and closer to the on-looker, and which is roughly 14 metres or 47 feet tall. Just right of the arrow sign and behind the right-hand grass bushel, there is a pine of about 20 metres or 70 feet, tall enough for its treetop to be beyond the borders of the image. All three cast a shadow on the ground around them and the pathway, as does a fourth 12-metre pine way to the left whose trunk is entirely outside the borders of the image, but whose shadow ends at my avatar's feet.
Between the second and the third pine, closer to the edge of the pathway than any of the pines, there are two rocks lying on the ground. Both take up the same ground area, but the one to the right is about knee-high, and the one to the left is roughly 60 percent higher. There is another group of eight grass bushels, four of which are in front of these two rocks while the other four seem to have fused with the rocks. More grass bushels surround the first pine.
The scenery to the right
To the right of the end of the path, there is another set of six stone mushrooms.
Another mountain pine, just a little shorter than the second one, is standing opposite the second one. Further to the right and further up-front, there are several more conifers of various heights, some only nine metres or 30 feet, others twice as high. The closest of these conifers, also one of the smallest, is at about a quarter of the width of the image away from the right-hand border, and it is the closest to the edge of the pathway.
All trees are made the tradition Second Life and OpenSim way: The trunk is a textured 3-D model. Everything else consists of the same partly transparent texture with branches, twigs and and needles on flat surfaces that pass through the trunk and have the texture on both sides. The mountain pines have three such surfaces at angles of 60 degrees from another, the other conifers have four which are 45 degrees apart. Within the context of the scenery, however, this is hardly noticeable, and it puts less strain on the graphics hardware.
To the right of the path, the ground is covered by a lot more grass, only that most of it more simple, using one partly transparent 2-D surface for each bushel, and only a bit higher than knee-high at its maximum.
A mostly wooden outdoor info board is protruding to the left from behind the closest of the conifers. Two vertical wooden teams have between them, from top to bottom, a longer but smaller horizontal beam, six rows of two slightly darker horizontal planks and another two horizontal beams, nine much smaller vertical bars standing between these two and connecting them. On top of each of the big vertical beams, two short beams mounted in a 90-degree arrangement carry a roof with a texture that seems to suggest slate shingles. The rooftop is a bit more than 3.60 metres or 12 feet above the ground. On the second row of planks, "Info Board" is written in a lighter tone of grey than the planks themselves. The last two letters are behind the trunk of the conifer in front of the sign.
Below the writing, at eye height, there are three square info panels on the board, each with a wooden frame around it. Only the ones on the left and in the middle are visible; the one to the right is fully obscured by the conifer again.
The panel on the left carries a worn-out advertising poster for BlackWhite Motel which is on the sim as well, in the opposite direction of where my avatar is looking. It is a two-storey building which is shaped like the letter L laid on the ground. On the short side in the left of the picture, it has room 101 and the office on the ground floor and rooms 201 and 202 upstairs. On the long side, it has eight more rooms, numbers 103 through 106 and 203 through 206, only six of which are in the picture; if you visit the motel itself, it becomes clearer why.
There is a parking-lot in front of the building with spaces for eleven cars, separated by white lines. One of the spaces in front of room 104, in the middle of the poster, is occupied by a two-tone white-and-white 1957 Chevrolet Impala four-door sedan which not only lacks hubcaps on its steel wheels, but also has opaque windows in a tone of grey just slightly lighter than the asphalt. Three spaces further to the right, at the right-hand edge of the poster, there is an almost identical car which is only darker all over from the carbody to the chrome trim to even the white walls on the tyres.
The ground floor of the building has a concrete walkway in front of itself which is a bit higher than the parking-lot. The upper floor can be accessed via 180-degree angled stairs in the corner of the building and an open gallery on the parking-lot side. Both the gallery and the actual front of the building are supported by vertical columns made of dark grey concrete with a square cross-section, save for one square cut-out in each corner.
All rooms, the office included, have dark grey doors which face the parking-lot, as do their windows which have very dark grey wooden frames and always come in pairs. Rooms 101 and 201 have two pairs of windows, one on each side of the door. The office and room 202 are window-less. The other rooms have one pair of windows to the right of the door. The doors are framed by two columns with a dark grey concrete panel above them. The wall sections with windows are otherwise filled with very light grey brick walls with unusually long bricks. The same bricks are used for the wall sections to the left of rooms 103 through 106 and 203 through 206 which also feature shiny black wall lamps with energy-saving bulbs.
The low walls that surround the gallery between the columns are made of eight long, very bright grey horizontal panels of probably some kind of metal each, topped with dark grey wooden handrails. The one in front of room 201 carries a flashing neon sign which reads "Vacancy" in all-caps with a brighter rectangular frame around it. Also, on top of the roof, near its front edge, in front of rooms 204 and 205, there's a "motel" sign with no caps and a rather unreliable illumination. Both signs are glowing on the poster.
All windows have blinds on the inside which are mostly closed. Only the right-hand blinds of rooms 101 and 201 and the blinds of rooms 103 and 204 are open. For those who want to know, even though it's outside the advertising poster: The blinds of room 106 are pulled up.
On the gallery-supporting column in front of the door to room 104, a medium-grey surveillance camera facing the parking-lot is moving into various positions. Next to the door of room 105, there is a refrigerated container for packaged ice with two side-hinged but actually unmoving bulb plate hatches on it.
On the ground in front of rooms 101 and 104, there are arrows consisting of seven chevrons each which point towards the office door. In order to enhance their effect, a gradient texture scrolls along them on each of them.
In the background behind the motel, a mountain pine rises above the roof in the middle to the left of the "motel" sign. Behind the sign and all the way to the left, there are three more conifers. These trees are basically identical to the ones in this image. Also, left of centre, a snow-covered mountain top rises further in the background.
The left half of the poster is covered by a dark overlay. Near its top, there is a very light grey rectangle aligned with the right-hand edge of the overlay. It has "Best Price" written on it in dark grey letters. Below that, there are five slightly ligher grey dingbats, either teardrop-spoked asterisks (Unicode U+273B) or sparkles (Unicode U+2747), which imply a five-star rating. An almost identical rectangle is just as close to the bottom and aligned with the left-hand edge of the poster, only that it has "Book Now" written on it. Between them, in the middle of the darkened half, "The BlackWhite Motel" is written in three lines with "BlackWhite" joined to one word again. All writing is done in the same narrow slab-serif typeface, and all characters including the dingbats have lighter lines around them that make them appear embossed.
Back to the panels on the info board: The one in the middle is mounted a bit lower than one on the left. It shows what appears to be a late medieval sea map of a place which I couldn't identify. Due to the limitation of in-world texture sizes to a maximum of 1024x1024 pixels, the rather small writing on the map is indecipherable. Most of it is ocean with some land in the upper half. On the land and in the bottom right corner, there are typical illustrations for maps from those days. The map shows its age with its darker tint and its jagged edges, and its shading makes it appear like it had been folded to a sixteenth of its original size before.
In front of the info board from the on-looker's point of view and actually between it and the conifer nearby, there is another group of eight grass bushels.
In front and partly to the right of the conifer, there is an object which doesn't exist in real life, but which is typical for OpenSimulator: an official OpenSimWorld beacon of the latest generation, but modified to fit the style of the sim.
This particular device has a shiny black foot with a long rectangular footprint which is about 80 percent as high as it is deep and tapered upward, and which has rounded edges. It carries the less shiny main body of the device. It starts narrower than the top surface of the foot in all directions. From bottom to top, it first protrudes forward and immediately increases in depth and slightly and curves backward and continues in a straight slope which still goes more upward than backward. Eventually, it curves upward and ends in a slim, rounded top. Transversally, it keeps the same width all the way. Both sides are carved out and illuminated, normally in cyan, here in almost white. Otherwise, it comes in its standard dark grey. However, it's actually a brownish anthracite grey, and the very top shows some light blue, so while it clearly hasn't received the monochrome treatment all over, a closer look also reveals that it should have. The same goes for the foot which is slightly bluish.
The straight section of the main body carries a shiny black frame with the central element of each OpenSimWorld beacon: the touch display with a ratio of 4:3. When not in use, this specimen shows the standard idle screen, only that it was modified to monochrome. Slightly above the middle, there is the official OpenSimWorld logo, namely the word "OpenSimWorld" itself with no actual caps. However, the "O" at the beginning is replaced with a circle matching the rounded sans-serif typeface which contains a stylised globe tilted to the left by an angle similar to Earth's inclination and showing three parallels and two meridians, but no land underneath. The last five letters, "world", are darker than the rest. Below it, in the same typeface, but in an even lighter grey, and without caps again, but a bit smaller, "teleporter" is written. Both lines also have shaded outlines that make them appear imprinted.
Further below, "Click for destinations" is written, still in the same type face and in about the same shade of grey as "OpenSim" above, but small enough to appear shorter than "teleporter" above. The background of the screen is a very light grey on the top 35 percent, medium grey on the bottom 35 percent and a gradient between the two. Clicking the screen breaks the monochrome theme, though, because the user interface which then appears has not been modified.
Lastly, there's a light grey panel on the front side of the foot which is scripted, too. It has "Like or comment this region" written on it in two lines in the same typeface as the writing on the touch screen, but with medium grey outlines. On the left, there is a medium grey thumb-up symbol, and on the right, there is a speech bubble with three dots in it in two shades of medium grey.
An OpenSimWorld beacon serves several purposes. For one, it transmits information about the sim to the website OpenSimWorld. This information includes not only the name of the sim and whether it's currently online, but also how many avatars are currently on the sim. The identities of these avatars are not transmitted, only how many they are. This makes finding sims with activity on them easier for users who want to go to parties or otherwise get into contacts with others, for OpenSim's general population density is much, much lower than Second Life's. This feature also helps generate rather controversial statistics about how popular any given sim is.
OpenSimWorld itself can be seen as the third-party centre of the decentralised Hypergrid. It started out about a decade ago as a sim catalogue, making navigating the Hypergrid and finding places much easier and more convenient than previous solutions like teleport stations or simply exchanging landmarks. Sims must be listed manually by registered users, and they need one OpenSimWorld beacon in-world. For example, this is the entry for Black White Castle.
In addition, OpenSimWorld offers discussion forums, user-created information and discussion groups for various topics, announcements of in-world events, information about free or paid land rentals other than whole sim rentals by grids, a catalogue for in-world scripts etc.
The other purpose of an OpenSimWorld beacon is as a teleporter which gives you access to currently about 1,700 sims all over the Hypergrid by means of a crowd-sourced sim list, namely that on OpenSimWorld itself. If you click the touch screen, it shows a list with the ten sims known to OpenSimWorld with the most avatars on them. Each sim is listed with its activity ranking, its name, the letter "A" in square brackets if it is Adult-rated and the number of avatars on it. The list can be navigated page by page with always ten sims on them. However, while it gets the information it shows directly from OpenSimWorld, it doesn't show any further information, not about the sim and not about whatever event may be on-going on any given sim. Clicking on a listed sim will immediately teleport you there, but it won't tell you what the place is where the beacon is taking you.
After a while of inactivity, the touch screen switches back into its idle mode.
Clicking the panel on the foot leaves a like on the entry of the sim.
The shadow of the tallest mountain pine on the left-hand side of the pathway is cast on the OpenSimWorld beacon.
All the way to the right, two leaves of an otherwise out-of-frame fern reach into the picture. Further above and in the background, the lower one of another pair of rocks appears with the higher one being to the right of it and hidden behind the trunk of a conifer.
The background
Just right of the avatar's head in the picture, the paved path ends at the foot of a rock cliff which spans the whole width of the image. The cliff is about nine metres or thirty feet high. A narrower, rocky path leads to the right and upward from just right of the middle of the paved path to about 40 percent of the height of the cliff. Then, hidden behind the mountain pine tree to the right of the end of the paved path, it takes a sharp turn of roughly 180 degrees to the left and ascends to about 80 percent of the height of the cliff. Right above where the paved path ends, almost right above my avatar's head, the cliff path takes another sharp U-turn to the right and ascends in a fairly gentle slope until it reaches the top of the cliff about as far right as the first turn.
The cliff extends to both sides at a constant height, save for its jagged upper edge. On the left and within the borders of the image, it does so roughly parallel to the paved path. It surrounds a largely snow-covered plateau with more mountain pines and other conifers on it and Black White Castle itself.
To the left of the top of the second mountain pine from the centre, upward from the pair of rocks left of the paved path, the top right corner of Black White Castle's dark grey roof of unidentified material peeks through a gap between the trees. The whole rest of the building is either hidden behind the forest, outside the image borders or both.
Further in the background, snow-covered mountains rise high above the treetops. These are actually already outside the sim, reaching into regions with no sim running on them. A little bit of sky appears to the right of the mountains. It is clear, but true to the visual theme of the whole sim, it is deep grey.
The final details
The camera is roughly at realistic eye height and oriented south-by-southwest-ward. The position of the Sun as the only directed light source in the picture is unusual for OpenSim, namely in the southeast. It is permanently fixed in this place because making one single setting for the sky is great deal easier than making settings for a whole day. But if it was moving, it would not do what it almost always does in OpenSim and pass through the zenith. Still, judging by the length of the tree shadows since the Sun is absent as an actual celestial body in the sky, it is too high up for winter.
Now, if you've made it all the way down here, I ask you: Which description is more accurate? Which description is more detailed? Which description is more informative? Which description actually helps you
understand the image?
#
Long #
LongPost #
CWLong #
CWLongPost #
BlackAndWhite #
Monochrome #
OpenSim #
OpenSimulator #
Metaverse #
VirtualWorlds #
AltText #
ImageDescription #
ImageDescriptions #
ImageDescriptionMeta #
CWImageDescriptionMeta #
AI #
LLaVA