Stable Diffusion 3 on Apple Silicon in Minutes

For the longest time, I thought I’d have to spend thousands on a beefy graphics card to generate images using state-of-the-art diffusion models at home. As far as I knew, it’d simply be a lot easier just to pay an online platform (like OpenAI, AWS Bedrock, or Leonardo) that had figured out all the hardware and avoid a four-figure PC build. But after spending some time looking into it, it’s actually surprisingly doable.

Whether you’re experimenting with local image generation, interested in playing with all the knobs and switches hosted providers don’t expose, or just want to avoid another surprisingly high invoice at the end of the month, you too can use your fancy MacBook with Apple Silicon to generate images with Stable Diffusion 3. Here’s how I did it.

Set Up Your Workspace

We’ll start with a clean working directory.

1
2
mkdir ~/localsd3
cd ~/localsd3

I’m using venv to manage my Python environment. Create a new virtual environment by calling the Python module directly:

1
python3 -m venv .venv

I use Poetry to manage my dependencies. You can use another tool to manage dependencies, but these are the commands I ran. See here for instructions on installing Poetry if you don’t have it already.

Let’s initialize a new Poetry project in the working directory:

1
poetry init

For Package name, Version, Description, Author, License, and Compatible Python versions, I accept the defaults. When you turn this code into a real project, you can take more care when choosing these values.

Decline defining main and development dependencies interactively, we’ll handle that seperately. Confirm generation and find your new pyproject.toml file ready to be populated.

Install Dependencies

We’ll need to install several Python dependencies to run our generation code.

1
2
3
poetry add torch diffusers \
    transformers accelerate \
    sentencepiece protobuf

Once these are installed, confirm that the [tool.poetry.dependencies] section of your pyproject.toml file looks something like this:

1
2
3
4
5
6
7
8
[tool.poetry.dependencies]
python = "^3.12"
torch = "^2.3.1"
diffusers = "^0.29.2"
transformers = "^4.42.4"
accelerate = "^0.32.1"
sentencepiece = "^0.2.0"
protobuf = "^5.27.2"

Create Your Generation Script

Next, we’ll create a Python script that uses the dependencies we’ve installed to orchestrate image generation on your laptop. Create main.py and start importing dependencies: sys to parse input arguments, and torch for configuration values.

1
2
import sys
import torch

HuggingFace’s diffusers library conveniently provides a diffusion pipeline specifically for Stable Diffusion 3. Let’s import it:

3
from diffusers import StableDiffusion3Pipeline

We’ll accept a string input that we’ll treat as a generation prompt. If a second string input is provided, we’ll treat that as the negative prompt.

5
6
7
8
9
prompt = sys.argv[1]

negative_prompt = ""
if len(sys.argv) > 2:
    negative_prompt = sys.argv[2]

The Stable Diffusion 3 Medium model is publicly available on HuggingFace. If model data isn’t already present on your device, this line will download the required assets from HuggingFace. Warning: they’re big — several gigabytes. Once they’re cached, you won’t have to download them again.

To be granted access to these assets, you’ll need to configure your HuggingFace account with your device’s public key (similar to the process used by Github to access private repos) and accept the Stable Diffusion 3 license and terms of use on HuggingFace.

11
12
13
14
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16
)

The next line is vital: here, we configure the appropriate inference device. MPS stands for Metal Performance Shaders, which uses the Metal framework to leverage the GPU on Apple silicon devices.

16
pipe = pipe.to("mps")

Optional: if your device has less than 64 Gigabytes of RAM, enabling attention slicing is recommended.

17
pipe.enable_attention_slicing()

Finally, we can invoke our pipeline. We’ll pull the first value from the pipeline result’s .images attribute. Based on this configuration, we’ll only ever return one image, anyway.

19
20
21
22
image = pipe(
        prompt,
        negative_prompt=negative_prompt,
).images[0]

We can save our image to the filesystem by creating image.png. The image object is a Pillow Image, which will write an image as a bitmap unless otherwise directed. We explicitly request "PNG" to save the image in that format.

24
25
with open(f"image.png", "wb") as f:
    image.save(f, "PNG")

Putting it all together, here’s the entire main.py script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import sys
import torch
from diffusers import StableDiffusion3Pipeline

prompt = sys.argv[1]

negative_prompt = ""
if len(sys.argv) > 2:
    negative_prompt = sys.argv[2]

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16
)

pipe = pipe.to("mps")
pipe.enable_attention_slicing() # Optional, recommended if <64GB RAM

image = pipe(
        prompt,
        negative_prompt=negative_prompt,
).images[0]

with open(f"image.png", "wb") as f:
    image.save(f, "PNG")

Generate Images

If we run the script with Python directly, we’ll lose the context of our virtual environment and won’t have access to the dependencies we’ve downloaded. Instead, we’ll use Poetry to run the script from within the virtual environment.

1
2
3
poetry run python main.py \
    "a bird sits on a tree branch on a sunny day" \
    "blurry, disfigured"

View the image using Preview:

1
open image.png

And hey, look at that! A lifelike bird perched on a tree branch, just like we asked for. Except this time, my computer invented it.

Next Steps

You can tailor this script for your generations by setting the many configurations exposed by the StableDiffusion3Pipeline. Common customizations are to set fields like guidance_scale, which controls how closely the diffuser adheres to the prompt. Lower values allow for more “creativity” in the output, and common values range from around 7 to 12.

You can also configure the num_inference_steps, which controls how many iterations are used when generating an image. A higher value can lead to better fine details and overall higher quality generations, but will increase generation time linearly. The default value is 28, and you’ll likely see diminishing returns in detail as the value approaches 50.

Providing a static seed value leads Stable Diffusion to use a repeated starting point for image generation. For the same seed and configuration values, Stable Diffusion will generate the same image each time it’s run. This can be very helpful for iterating quickly on configuration values to find an ideal balance for your generation subject.

Integrating these configurations with the example above looks like this:

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16
)

pipe = pipe.to("mps")
pipe.enable_attention_slicing() # Optional
generator = [torch.Generator(device="mps").manual_seed(240715)]

image = pipe(
        prompt,
        negative_prompt=negative_prompt,
        guidance_scale=8,
        num_inference_steps=35
        generator=generator,
).images[0]

Now go generate some images!