Background on the idea

Admittedly, I am late to exploring image generation with DALL-E. But as they say, better late then never! I wanted to quickly post on some recent exploration using Open AI’s DALL-E. I won’t go into how DALL-E works, but OpenAI describes it as AI system that can create realistic images and art from a description in natural language. You can learn more about here.

I had a section of my website for data notes, to capture useful code snippets, new packages, or just tools/techniques I haven’t used much. I saw a similar idea on someone elses blog and thought it was a great idea. That said, I have yet to populate it with anything.

So for the short term, I figured why not create an image of bear saying something like “Please BEAR with us” to announce the site was under construction. A very normal and commonsense idea in my eyes.

Using Open AI for image generation

As I expected, it was incredibly easy to use the Open AI api with just a few lines of code. Each each call only cost a few pennies (I saw 3-4 cents per image) and took 5-10 seconds to return a response. So overall, very inexpensive and quick to iterate with new different images.

I started with a vague prompt about a bear saying “please bear with me”, and slowly refined it.

hide / show code

from openai import OpenAI
import os
from secret_keys import OPENAI_KEY

client = OpenAI(api_key=OPENAI_KEY)

response = client.images.generate(
  model="dall-e-3",
  prompt="I want a picture of a bear in construction gear sitting at a computer typing code.  I want a message at the top of the image that says 'Please BEAR with us, this site is under construction. The idea is for this to be cartoonish and funny!  Please add this exact phrase at the bottom 'Created with DALL-E'",
  size="1024x1024",
  quality="standard",
  n=1,
)

image_url = response.data[0].url
image_url

'https://oaidalleapiprodscus.blob.core.windows.net/private/org-QwvrbKaSi0VeflwsDx1dYz8t/user-xeCmfMW6XAJBDsWzkXkIRsNp/img-2xf5B8J4ZbuylSvNHcSRPInx.png?st=2024-02-17T18%3A19%3A44Z&se=2024-02-17T20%3A19%3A44Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-02-17T18%3A56%3A39Z&ske=2024-02-18T18%3A56%3A39Z&sks=b&skv=2021-08-06&sig=ldgus5CfsRP6VGOW75XUf%2B2IiRF7C8e7wCXEYp3hFUY%3D'

And just like that, you have a url where you can view the generated image and download it locally. A few things to note that I found interesting:

As of February 2024, the image will stay up for one hour per the DALL-E documentation.
You’ll notice that while the image outputs are impressive, the text generation was surprisingly bad. Even with direct prompts to include a specific saying, it often got the phrase wrong, misspelled words or even made up words. Some quick research confirmed it’s a common occurrence. I read some explanations for why, but I did not go deep enough to confirm exactly why. I am interested to read more on that.

The finished outputs

You can find the page I setup here until I add actual notes. And in case that happens, you can see a rough copy of the page below. I’ll note, I leveraged the lightbox feature with Quarto so you can click on the first image to cycle through all seven.