Stable Diffusionのオプションメモ

前回の記事の続きです。
今回はスクリプト実行時に使えるオプションについての使用感なんかを書いていければ。

いきなり横道にそれますが、Stable Diffusionのカスタム版がどんどん増えてるようで、ついにグラボでなくCPUで動かせるものが出てきたみたいですね。 github.com zenn.dev またひとつハード的なハードルが下がったようですばらしいですね。

話を戻してオプションの話。
Stable Diffusionのtxt2imgやimg2img、さらに低負荷版のoptimized_txt2imgやoptimized_img2imgには、実行時に--promtや--seedなどのオプションを設定できます。
ただひとつひとつが何を意味するのか、どういう効果をもたらすのかというのがいろいろな解説サイト巡らないと網羅できないので、ざっくりと一覧を作りたいなという感じです。
（よくわかってないのも含みます）
ちなみに使用可能なオプションはtxt2img.py --helpという感じで--helpオプションを付けて実行したときの一覧を元に解説します。

txt2img（本家版　テキストから画像を生成するスクリプト）
- オプション一覧
  - -h, --help
  - --prompt [PROMPT]
  - --outdir [OUTDIR]
  - --skip_grid
  - --skip_save
  - --ddim_steps
  - --plms
  - --laion400m
  - --fixed_code
  - --ddim_eta
  - --n_samples
  - --n_iter
  - --H
  - --W
  - --C
  - --f
  - --n_rows
  - --scale
  - --from-file
  - --config
  - --ckpt
  - --seed
  - --precision {full,autocast}
img2img（本家版　画像から画像を生成するスクリプト）
- オプション一覧
  - --init-img
  - --strength
optimized_txt2img（低負荷版　テキストから画像）
- オプション一覧
  - --outdir [OUTDIR]
  - --H --W
  - --device
  - --unet_bs
  - --turbo
  - --format
  - --sampler
optimized_img2img.py（低負荷版　画像から画像）
- オプション一覧
締め
オプションとは関係ない話
- NSFW
- GUI

txt2img（本家版　テキストから画像を生成するスクリプト）

オプション一覧

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --laion400m           uses the LAION400M model
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision

-h, --help

上記のオプション一覧を表示する。

--prompt [PROMPT]

いわゆる呪文をこのオプションの引数という形で設定する。
改行ができないので呪文の量が増えてくるとすごく見づらい。

--outdir [OUTDIR]

生成した画像を出力するフォルダを指定する。
--outdir "./outputs/sample"

--skip_grid

デフォルトでは生成した画像をタイル状に並べてまとめた画像を生成するが、それを生成しないように指定する。
引数を渡す必要はなく、このオプションを記載するだけでいい。

--skip_save

do not save individual samples. For speed measurements.

と書いてある通り、生成した画像を保存しないオプションらしい（画像が目的なので使ったことない）。

--ddim_steps

画像の精度に関わるオプション。
ものによっては50ぐらいでも十分なこともあるが、アニメ絵みたいなのを生成する場合100以上は欲しい。
この値を上げれば上げるほど1枚生成するのに時間がかかるし、VRAMの使用量も増える。
うちの環境では250より大きくするとメモリ不足で落ちる。

--plms

よくわからんが使ってるオプション。
PLMSサンプラーという言葉がコマンド実行中のログに流れるので、学習した画像から要素引っ張ってくるときの手法かなんか？
とりあえずつけとけばいい感。

--laion400m

よくわからんので使ってないオプション。
ググるとLAION-400-MILLION OPEN DATASETというのが引っかかるので、4億枚の画像を参考画像かなんかに使う感じだろうか。

--fixed_code

よくわからんので使ってないオプション。
すべてのサンプルで同じ開始コードを使う？同じシード値を使うとかではない？
それなら同scaleでのばらつきとかが比較できて有用かも。

--ddim_eta

よくわからんので使ってないオプション。
ETAと聞くと思い浮かぶのはEstimated Time of Arrival（到着予定時刻）なので処理完了までの時間を指定するオプションに思えるけど、
GUI版では値の範囲が0.00～1.00なのでよくわからん。

--n_samples

一回の生成処理で何枚の画像を生成するかを指定する。
--n_samples 3とすれば一回の処理で3枚生成される。
とはいえ枚数を増やせばその分時間がかかるし、このオプションの値を上げると消費メモリが増える。
本家の方では欲張らず1を指定するのが無難。

--n_iter

画像生成処理を何回繰り返すかを指定する。
--n_iter 10とすれば10回繰り返される。
n_samplesオプションとの組み合わせで合計生成枚数が決まるので、
--n_samples 3 --n_iter 10という指定をすると30枚生成される。

--H

画像の高さ。
値は64刻みで指定しないとエラーが出る。
本家の方ではあまり欲張れないので256とかにしがち。

--W

画像の幅。
他は高さと同じ。

--C

よくわからんので使ってないオプション。
latent channelsが潜在チャンネルという意味で、
Stable Diffusionが確か潜在拡散（latent diffuision）モデルというのを利用しているらしいので、
結構大事なオプションなんじゃないかと思うけど…。

--f

よくわからんので使ってないオプション。
機械学習関連の用語とは思うけど使わなくても何とかなってるので調べてない…。

--n_rows

生成した画像をグリッドにまとめるときに何行にするかというオプション。
そもそもグリッドいらないので使ってない。

--scale

生成される画像をどれだけ呪文の内容に沿ったものにするか、というオプションらしい。
0～50の範囲で指定できるが、50にしておけば呪文通りのものができるかというとそうでもない。
逆にバケモンが生まれたりもするので、7.5～20あたりがいいかも。

--from-file

呪文をファイルから読み込むオプション。
よさげなオプションに見えるけどあんま使ってない。

--config

よくわからんので使ってないオプション。
「モデルを構築するための設定ファイルへのパス」とのことなので、下手にいじらん方がいいんじゃないかと思う。

--ckpt

.ckptファイルのパスを指定するオプションっぽい。
Stablediffusionの派生型とかも使いたいってなったときに.ckptファイルあちこちコピペすんのめんどいし重い場合は使いそう。

--seed

シード値。
生成される画像を見ると、なんかしら大本になった画像があってそこにいろんな要素くっつけた感じなので、その大本の画像を指定するものかも。
scaleだけ変えたりして結果を比較したいときにシード値固定すると便利。
ただ画像の大きさや呪文の最初の方変えてしまうと、同じシード値でも結果ががっつり変わるので注意。

--precision {full,autocast}

よくわからんので使ってないオプション。
グラボがGTX16XXとかだと画像が単色になっちゃう現象があるらしく、その場合これを指定すると直るという記事を見た。

img2img（本家版　画像から画像を生成するスクリプト）

オプション一覧

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --init-img [INIT_IMG]
                        path to the input image
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save indiviual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --fixed_code          if enabled, uses the same starting code across all samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --C C                 latent channels
  --f F                 downsampling factor, most often 8 or 16
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --strength STRENGTH   strength for noising/unnoising. 1.0 corresponds to full destruction of information in init
                        image
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision

txt2imgと被るものは省略

--init-img

元となる画像を指定するオプション。
--init-img "./outputs/sample/001.png"

--strength

--init-imgで指定した画像をどれだけ改変するかを指定する。
0～1の範囲で指定でき、0だと何もかわらない、1だと元画像の面影なし。
txt2imgとかで生成した画像の調整がしたいなら低い値で、
ざっくりとしたイメージだけ手書きした画像とかからなら高い値で、という使い分けになる。
ちなみに0に近い値の方が処理速度が上がる。追記低負荷版の動作中のログ見て気付いたけど、実際に処理してるddim_stepsの値が引数で指定したddim_step×strengthに一致してる。

optimized_txt2img（低負荷版　テキストから画像）

オプション一覧

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --device DEVICE       specify GPU (cuda/cuda:0/cuda:1/...)
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --seed SEED           the seed (for reproducible sampling)
  --unet_bs UNET_BS     Slightly reduces inference time at the expense of high VRAM (value > 1 not recommended )
  --turbo               Reduces inference time on the expense of 1GB VRAM
  --precision {full,autocast}
                        evaluate at this precision
  --format {jpg,png}    output image format
  --sampler {ddim,plms}
                        sampler

txt2imgと変わらないものは省略

--outdir [OUTDIR]

オプションの名前こそ変わらないものの、低負荷版では呪文の内容でフォルダが切られる処理が入っており、
実際にはこのオプションで指定されたフォルダに呪文ごとのフォルダが切られ、その下に画像が生成される。
この影響で呪文の最初の方にフォルダ名に使えない記号とかが入ってるとエラーになるので注意。

--H --W

機能的には全く変わらないが、低負荷になってるので大きめのサイズを指定しても動くようになった。
画像のサイズで生成される画像が大きく変わるのでいろいろ試すことが可能に。

--device

よくわからんので使ってないオプション。
というか説明を見る限りグラボが複数ある場合にどれを使うか指定するオプションっぽいので、
大概の人にとって関係のないオプションに思える。

--unet_bs

よくわからんので使ってないオプション。
説明を翻訳すると「推論時間を若干短縮するが、VRAMが高負荷になる」なので、
特にいじる必要がないように思える。

--turbo

ちょっと高負荷にして時間短縮にするオプション。
本家は重すぎたけどこっちは軽すぎてスペックに余裕がありすぎる…て場合につける。
VRAMが8GBのうちはつけてる。

--format

説明の通り画像をjpgかpngか選べる。

--sampler

よくわからんが使ってるオプション。
多分本家版の--plmsに相当する。
ただREADMEとかを読むと何も設定しなくてもデフォルトがplmsになってるっぽい。

optimized_img2img.py（低負荷版　画像から画像）

オプション一覧

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --init-img [INIT_IMG]
                        path to the input image
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --strength STRENGTH   strength for noising/unnoising. 1.0 corresponds to full destruction of information in init
                        image
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --seed SEED           the seed (for reproducible sampling)
  --device DEVICE       CPU or GPU (cuda/cuda:0/cuda:1/...)
  --unet_bs UNET_BS     Slightly reduces inference time at the expense of high VRAM (value > 1 not recommended )
  --turbo               Reduces inference time on the expense of 1GB VRAM
  --precision {full,autocast}
                        evaluate at this precision
  --format {jpg,png}    output image format
  --sampler {ddim}      sampler

上記3つと完全にかぶるので省略

締め

ざっと上記の感じです。
本家の方では使い道がわからんオプションが多かったので、
そのうち勉強して更新できたらいいなと思いつつ、
実際問題わからなくても使えてるのでモチベーションはそんなにないです。

オプションとは関係ない話

NSFW

本家版ではNSFWな画像が弾かれるチェックが行われてそういう画像が出せないのですが、低負荷版は画像生成の処理にかなり手を加えているらしく、
その過程でNSFWな画像を弾く処理が省かれた結果全部駄々洩れになります。
本家の方でも、NSFWチェックが負荷かかる（建前？）ので
チェックしないオプションを追加するプルリクが上がってるとか。

GUI

低負荷版の方に手加えてGUIを追加してる人がいて、ものによってはboothやらなんやらで売ってたりもしますが、
GUIは欲しいけどお金出してまではなぁ…ほかにいろいろ入れんのものなぁ…という場合は、
python optimizedSD/txt2img_gradio.py python optimizedSD/img2img_gradio.py
とコマンド叩けばブラウザで動くGUI版が起動するので、ブラウザでhttp://localhost:7860にアクセスすれば使えるようになります。
あと本家は知りませんが、低負荷版ではコマンドのログが残るようになったので、
ブラウザ閉じちゃったとかで設定が飛んでもログを見て設定入れなおすことが可能に。
時々Gitのフェッチかけてみたりとかリモートリポジトリの方覗いたりすると面白いかもですね。

txt2img（本家版 テキストから画像を生成するスクリプト）

オプション一覧

-h, --help

--prompt [PROMPT]

--outdir [OUTDIR]

--skip_grid

--skip_save

--ddim_steps

--plms

--laion400m

--fixed_code

--ddim_eta

--n_samples

--n_iter

--H

--W

--C

--f

--n_rows

--scale

--from-file

--config

--ckpt

--seed

--precision {full,autocast}

img2img（本家版 画像から画像を生成するスクリプト）

オプション一覧

--init-img

--strength

optimized_txt2img（低負荷版 テキストから画像）

オプション一覧

--outdir [OUTDIR]

--H --W

--device

--unet_bs

--turbo

--format

--sampler

optimized_img2img.py（低負荷版 画像から画像）

オプション一覧

締め

オプションとは関係ない話

NSFW

GUI

txt2img（本家版　テキストから画像を生成するスクリプト）

img2img（本家版　画像から画像を生成するスクリプト）

optimized_txt2img（低負荷版　テキストから画像）

optimized_img2img.py（低負荷版　画像から画像）