I've just discovered
the Large Language and Vision Assistant or LLaVA which is capable of generating image descriptions and can do so without requiring an account and probably also without feeding greedy gigacorporations with your data.
I'm tempted to try it with one of my own images and then compare the result with my own description of the same image regarding details, informativity and accuracy.
