如何分析圖片轉成文字提示詞?

有時想知道一張相片應該用什麼文字來形容,這張圖片可能是來自 Midjourney、Stable Diffusion,Photoshop 繪畫的圖片或是相機拍攝的真實相片,雖然 Stable Diffusion 的圖片可以憑 infotext 去讀取 prompts,但經過轉存或壓縮的圖片也可能遺失 infotext,這時候我們可以用什麼方法取得 prompts?

如何分析圖片轉成文字提示詞?

有時想知道一張相片應該用什麼文字來形容,這張圖片可能是來自 Midjourney、Stable Diffusion,Photoshop 繪畫的圖片或是相機拍攝的真實相片,雖然 Stable Diffusion 的圖片可以憑 infotext 去讀取 prompts,但經過轉存或壓縮的圖片也可能遺失 infotext,這時候我們可以用什麼方法取得 prompts?

接下來我會用原圖,CLIP Interrogator 及 ChatGPT 取得的 prompts 進行比較。


原圖及 Prompts

首先在 Midjourney 取得一張圖片,由於 Midjourney 本身附帶 prompts,我會用作對比之用。

Image from Midjourney

Midjourney's prompts

A real photo by fashion photographer Tim Walker, full body photo, commercial work, showcasing a Chinese little girl wearing a super gorgeous dress, standing in a primitive forest, playing with small animals. The dress emits a faint orange glow, and looks like a group of fireflies from afar. It is a surreal future background in commercial photography. --ar 2:3 --stylize 750 --v 6


取得文字 Prompts

首先我會使用 CLIP Interrogator 2 來取得文字 Prompts,打開以下網址。

CLIP Interrogator 2 - a Hugging Face Space by fffiloni
Discover amazing ML apps made by the community

上載圖片到網址取得 prompts

a woman in an orange dress standing in a forest, cgsociety contest winner, digital art, fireflies and sparkling wisps, full body close-up shot, portrait of magical young girl, magical sparks


利用 ChatGPT 改良 Prompts

將 CLIP Interrogator 2 的 Prompts 放進 ChatGPT 並要求為 Stable Diffusion 優化 prompts,得出以下 prompts

Place the woman in the orange dress in the center of the image, standing amidst a lush forest backdrop. Ensure that the composition is balanced and visually appealing. Emphasize the orange dress to make it stand out against the greenery of the forest. Use warm tones for the overall scene to create a magical and inviting atmosphere. Incorporate fireflies and sparkling wisps to add magical illumination to the scene. Scatter them around the forest, especially around the woman, to enhance the enchanting ambiance. Capture the woman in a full-body close-up shot, showcasing her figure and the details of her dress. Ensure that her pose exudes confidence and grace, fitting for a contest-winning portrait. Depict the woman with a serene and mystical expression, conveying her connection to the magical elements surrounding her. Pay attention to details such as the texture of the dress, the foliage in the forest, and the subtle movements of the fireflies and wisps. These details will enhance the overall realism and appeal of the artwork. Integrate magical sparks into the scene, emanating from the woman's hands or surrounding her in a mesmerizing display of magic.

順道也把 Midjourney 的 prompts 做相同的優化

Study Tim Walker's style and previous works to understand his aesthetic and composition choices. Ensure the full body of the Chinese little girl is prominently featured in the center of the image. The primitive forest should serve as the backdrop, creating a surreal and enchanting atmosphere. Design a super gorgeous dress for the girl, inspired by Tim Walker's extravagant fashion photography. The dress should emit a faint orange glow, resembling a group of fireflies when viewed from afar. Pay attention to intricate details and textures. Create a lush and primitive forest environment, with towering trees, dense foliage, and a natural, untouched feel. Incorporate small animals like rabbits, squirrels, or birds for the girl to interact with, adding charm to the scene. Use soft, natural lighting to illuminate the scene, enhancing the magical aura of the forest and the glow of the dress. The glow from the dress should be subtle but noticeable, adding to the surreal nature of the image. Direct the girl to exude innocence, curiosity, and joy as she plays with the small animals. Her pose should be natural and dynamic, capturing a moment of wonder and delight. Integrate futuristic elements subtly into the background, such as unusual plant formations or hints of advanced technology peeking through the forest foliage. These elements should complement the surreal atmosphere without overpowering the natural beauty of the scene. Pay attention to post-processing details such as color grading, contrast adjustments, and adding subtle effects to enhance the overall mood and impact of the image.


使用以上四組 prompt 在 Stable Cascade 生成 1024x1024 圖片進行比較!

結果顯然易見,Stable Cascade 對 Midjourney 的友好程度為零,而經 ChatGPT 優化的結果也沒有 CLIP Interrogator 2 生成的 prompts 來的好,不同公司出品的 AI Tools 果然不能合作嗎?還是我這次測試用的 Stable Cascade 不夠成熟?