forked from eniora/InstantID-Unlocked
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathinstant-id-unlocked.py
More file actions
2148 lines (1995 loc) · 104 KB
/
instant-id-unlocked.py
File metadata and controls
2148 lines (1995 loc) · 104 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
import sys
sys.path.append("./")
from typing import Tuple
import os
import cv2
import math
import torch
import random
import numpy as np
import gc
import warnings
import subprocess
import PIL.PngImagePlugin
import time
warning_messages = [
".*timm.models.layers.*",
".*timm.models.registry.*",
".*Overwriting tiny_vit_.* in registry.*",
".*peft_config.*multiple adapters.*",
".*rcond.*will change to the default.*",
".*MultiControlNetModel.*is deprecated.*",
".*`resume_download` is deprecated.*",
".*Should have .*<=t1 but got .*",
".*unable to parse version details from package URL.*",
".*cache-system uses symlinks by default.*",
]
for msg in warning_messages:
warnings.filterwarnings("ignore", message=msg)
import logging
logger = logging.getLogger("transformers.tokenization_utils_base")
logger.addFilter(lambda record: "Token indices sequence length is longer" not in record.getMessage())
os.environ["NO_ALBUMENTATIONS_UPDATE"] = "1"
# os.environ["TRANSFORMERS_OFFLINE"] = "1"
os.environ["HF_HUB_CACHE"] = "models"
os.environ["HF_HUB_CACHE_OFFLINE"] = "true"
os.environ["GRADIO_ANALYTICS_ENABLED"] = "False"
os.environ["GRADIO_DISABLE_TELEMETRY"] = "1"
vram_bytes = torch.cuda.get_device_properties(0).total_memory
vram_gb = vram_bytes / (1024**3)
default_vae_tiling = vram_gb >= 15
def open_output_folder():
path = os.path.abspath("output")
if sys.platform == "win32":
os.system(f'start "" "{path}"')
elif sys.platform == "darwin":
subprocess.Popen(["open", path])
else:
subprocess.Popen(["xdg-open", path])
import PIL
from PIL import Image
DEFAULT_FILE_PREFIX = "InstantID_"
FILENAME_SAFE_TRANS = str.maketrans('', '', '\\/:*?"<>|')
def save_images(images, output_dir="output", generation_info=None, prefix=DEFAULT_FILE_PREFIX):
os.makedirs(output_dir, exist_ok=True)
existing = [f for f in os.listdir(output_dir) if f.startswith(prefix) and f.endswith(".png")]
used_numbers = [int(f[len(prefix):].split(".")[0]) for f in existing if f[len(prefix):].split(".")[0].isdigit()]
start_index = max(used_numbers, default=-1) + 1
paths = []
for i, img in enumerate(images):
filename = f"{prefix}{start_index + i}.png"
path = os.path.join(output_dir, filename)
img.save(path, pnginfo=generation_info[i] if generation_info else None)
paths.append(path)
return paths
cached_controlnet_models = {}
import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel
from diffusers.pipelines.controlnet.multicontrolnet import MultiControlNetModel
from huggingface_hub import hf_hub_download
from insightface.app import FaceAnalysis
from style_template import styles
from pipeline_stable_diffusion_xl_instantid_full import StableDiffusionXLInstantIDPipeline
from pipeline_stable_diffusion_xl_instantid_img2img import StableDiffusionXLInstantIDImg2ImgPipeline
from model_util import load_models_xl, get_torch_device, torch_gc
from controlnet_aux import OpenposeDetector
from transformers import DPTImageProcessor, DPTForDepthEstimation
device = get_torch_device()
depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(device)
feature_extractor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
def get_depth_map(image):
image = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")
with torch.no_grad(), torch.autocast("cuda"):
depth_map = depth_estimator(image).predicted_depth
depth_map = torch.nn.functional.interpolate(
depth_map.unsqueeze(1),
size=(1024, 1024),
mode="bicubic",
align_corners=False,
)
depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
depth_map = (depth_map - depth_min) / (depth_max - depth_min)
image = torch.cat([depth_map] * 3, dim=1)
image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
return image
def get_canny_image(image, t1=100, t2=200):
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
edges = cv2.Canny(image, t1, t2)
return Image.fromarray(edges).convert("L")
import gradio as gr
MAX_SEED = 2**53 - 1
MAX_SEED_RAND = np.iinfo(np.uint32).max - 1
dtype = torch.float16 if str(device).__contains__("cuda") else torch.float32
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "(No style)"
def get_random_style_prompt(prompt_substitute="person"):
available_styles = [s for s in STYLE_NAMES if s != DEFAULT_STYLE_NAME]
if not available_styles:
return "", DEFAULT_NEGATIVE_PROFILE, DEFAULT_STYLE_NAME
selected_style = random.choice(available_styles)
print(f"Inserted random style: {selected_style}")
style_prompt, style_neg_prompt = styles[selected_style]
replacement = " " if prompt_substitute == "Empty (none)" else prompt_substitute
random_prompt = style_prompt.replace("{prompt}", replacement).strip()
return random_prompt, style_neg_prompt, DEFAULT_STYLE_NAME
def apply_selected_style(style_name, prompt_substitute="person"):
if style_name == "(No style)":
return gr.update(), gr.update(), gr.update()
print(f"Inserted selected style: {style_name}")
style_prompt, style_neg_prompt = styles[style_name]
replacement = " " if prompt_substitute == "Empty (none)" else prompt_substitute
return (
style_prompt.replace("{prompt}", replacement).strip(),
style_neg_prompt,
"(No style)"
)
NEGATIVE_PROMPT_PRESETS = {
"Default Negative Profile": "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, (frame:1.2), deformed, ugly, blurry, deformed cat, deformed photo",
"Aggressive Negative Profile (InstantID default)": "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, (frame:1.2), deformed, ugly, deformed eyes, blur, out of focus, blurry, deformed cat, deformed, photo, anthropomorphic cat, monochrome, photo, pet collar, gun, weapon, blue, 3d, drones, drone, buildings in background, green",
"Negative Profile 1 (General use)": "low quality, worst quality, text, watermark, deformed, ugly",
"Negative Profile 2 (Minimalist)": "(worst quality, low quality:1.2), deformed, blurry, mutated, extra limbs",
"Negative Profile 3 (Portraits)": "(worst quality:1.3), (low quality:1.2), bad anatomy, deformed, disfigured, fused fingers, missing fingers, extra limbs, poorly drawn face, poorly drawn hands, blurry",
"Negative Profile 4 (SDXL default)": "(worst quality, low quality:1.3), watermark, signature, text, frame, jpeg artifacts, blurry, deformed, extra limbs, bad hands, fused fingers, poorly drawn face",
"Negative Profile 5 (Realism)": "(worst quality, low quality:1.3), anime, cartoon, illustration, cgi, 3d render, painting, drawing, deformed, extra fingers, fused fingers, blurry, unrealistic",
"Negative Profile 6 (Stylized / Illustration)": "(worst quality:1.3), bad anatomy, deformed eyes, bad hands, long neck, lowres, jpeg artifacts, text, watermark, extra fingers",
"Negative Profile 7 (Digital Illustration)": "(worst quality, low quality:1.3), bad anatomy, blurry, duplicate, signature, watermark, jpeg artifacts",
"Negative Profile 8 (Anime)": "(worst quality:1.2), photorealistic, real life, realistic skin, 3d render, painting, extra limbs, fused fingers, bad anatomy, blurry, text, watermark",
"Negative Profile 9 (Ultra Minimal)": "low quality, deformed",
"Negative Profile 10 (3D Render)": "photo, photorealistic, realistic, painting, sketch, drawing, anime, cartoon, 2d, flat color, low detail, text, watermark, blurry",
"Negative Profile 11 (Plastic Toy Render)": "photo, illustration, sketch, painting, anime, blurry, lowres, noisy, realistic skin, lifelike eyes, textureless",
"Negative Profile 12 (Game Character (Stylized 3D))": "photo, painting, sketch, drawing, anime, real skin texture, flat shading, realistic proportions, soft shadows, photorealistic",
"Negative Profile 13 (Sculpted Statue Render)": "cartoon, photo, realism, painterly, anime, soft brush, flat colors, 2d, smooth shading",
"Negative Profile 14 (Low Poly Stylized)": "realism, photo, anime, high detail, highres, 2d, blurry, smooth shading, overrendered, soft shadows",
"Negative Profile 15 (Fooocus Enhance)": "(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)",
"Negative Profile 16 (Fooocus Negative)": "deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers, drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch",
}
DEFAULT_NEGATIVE_PROFILE = NEGATIVE_PROMPT_PRESETS["Default Negative Profile"]
def on_style_change(style_name):
if style_name == "(No style)":
return gr.update(), gr.update()
else:
print(f"Manual style selection: {style_name}")
return gr.update(value=""), gr.update(value="")
EXCLUDED_MODELS = {
"diffusers/controlnet-canny-sdxl-1.0",
"diffusers/controlnet-depth-sdxl-1.0-small",
"Intel/dpt-hybrid-midas",
"lllyasviel/Annotators",
"lllyasviel/ControlNet",
"xinsir/controlnet-openpose-sdxl-1.0"
}
EXCLUDED_MODELS_LOWER = {m.lower() for m in EXCLUDED_MODELS}
def get_available_models():
models_dir = "models"
model_folders = []
if os.path.exists(models_dir):
for folder in os.listdir(models_dir):
if folder.startswith("models--"):
model_name = folder.replace("models--", "").replace("--", "/")
if model_name.lower() in EXCLUDED_MODELS_LOWER:
continue
model_folders.append(model_name)
return model_folders
AVAILABLE_MODELS = get_available_models()
DEFAULT_MODEL = "eniora/RealVisXL_V5.0"
DET_SIZE_OPTIONS = {
"160x160 (for very lowres portrait photos)": (160, 160),
"320x320": (320, 320),
"640x640 (default)": (640, 640),
"800x800": (800, 800),
"1024x1024": (1024, 1024),
"1280x1280": (1280, 1280),
"2560x2560 (Input/Reference image size should be larger than 2560x2560)": (2560, 2560)
}
current_det_size = (640, 640)
app = FaceAnalysis(
name="antelopev2",
root="./",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
app.prepare(ctx_id=0, det_size=current_det_size)
def read_png_metadata(filepath):
if filepath is None:
return "No image selected"
try:
with Image.open(filepath) as img:
metadata = img.info
if "Generation Parameters" in metadata:
return metadata["Generation Parameters"]
return "No generation metadata found in this PNG file."
except Exception as e:
return f"Error reading metadata: {str(e)}"
face_adapter = f"./checkpoints/ip-adapter.bin"
controlnet_path = f"./checkpoints/ControlNetModel"
controlnet_identitynet = ControlNetModel.from_pretrained(
controlnet_path, torch_dtype=dtype
)
controlnet_pose_model = "xinsir/controlnet-openpose-sdxl-1.0"
controlnet_canny_model = "diffusers/controlnet-canny-sdxl-1.0"
controlnet_depth_model = "diffusers/controlnet-depth-sdxl-1.0-small"
controlnet_model_paths = {
"pose": controlnet_pose_model,
"canny": controlnet_canny_model,
"depth": controlnet_depth_model,
}
controlnet_map_fn = {
"pose": openpose,
"canny": get_canny_image,
"depth": get_depth_map,
}
def get_available_loras():
loras_dir = "./models/Loras"
if not os.path.exists(loras_dir):
return []
lora_files = []
for file in os.listdir(loras_dir):
if file.endswith(('.safetensors', '.ckpt', '.pt')):
lora_files.append(file)
return lora_files
def restart_server(open_browser):
python = sys.executable
script = os.path.abspath(sys.argv[0])
args = sys.argv[1:]
os.environ["IN_BROWSER"] = "1" if open_browser else "0"
torch.cuda.empty_cache()
if sys.platform == "win32":
subprocess.Popen([python, script] + args, creationflags=subprocess.DETACHED_PROCESS | subprocess.CREATE_NEW_PROCESS_GROUP)
else:
subprocess.Popen([python, script] + args, preexec_fn=os.setsid)
os._exit(0)
def update_det_size(det_size_name):
global app, current_det_size
new_size = DET_SIZE_OPTIONS[det_size_name]
if new_size != current_det_size:
current_det_size = new_size
app = FaceAnalysis(
name="antelopev2",
root="./",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
app.prepare(ctx_id=0, det_size=current_det_size)
return f"Detection size set to {current_det_size}"
def main(pretrained_model_name_or_path="eniora/RealVisXL_V5.0"):
if vram_gb >= 15:
pipe = None
else:
if pretrained_model_name_or_path.endswith(
".ckpt"
) or pretrained_model_name_or_path.endswith(".safetensors"):
scheduler_kwargs = hf_hub_download(
repo_id="eniora/RealVisXL_V5.0",
subfolder="scheduler",
filename="scheduler_config.json",
)
(tokenizers, text_encoders, unet, _, vae) = load_models_xl(
pretrained_model_name_or_path=pretrained_model_name_or_path,
scheduler_name=None,
weight_dtype=dtype,
)
scheduler = diffusers.DPMSolverMultistepScheduler.from_config(scheduler_kwargs)
pipe = StableDiffusionXLInstantIDPipeline(
vae=vae,
text_encoder=text_encoders[0],
text_encoder_2=text_encoders[1],
tokenizer=tokenizers[0],
tokenizer_2=tokenizers[1],
unet=unet,
scheduler=scheduler,
controlnet=[controlnet_identitynet],
).to(device)
else:
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
pretrained_model_name_or_path,
controlnet=[controlnet_identitynet],
torch_dtype=dtype,
feature_extractor=None,
).to(device)
pipe.scheduler = diffusers.DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
print(f"Detected GPU VRAM: {vram_gb:.2f} GB → "
f"VAE Tiling: {'Enabled' if default_vae_tiling else 'Disabled'} (you can change it manually in the UI)")
def load_and_cache_controlnet_model(controlnet_type):
if controlnet_type not in cached_controlnet_models:
print(f"Loading ControlNet model: {controlnet_type}")
model = ControlNetModel.from_pretrained(controlnet_model_paths[controlnet_type], torch_dtype=dtype).to(device)
cached_controlnet_models[controlnet_type] = model
return cached_controlnet_models[controlnet_type]
def toggle_lora_ui(enable_lora_checkbox):
return [gr.update(visible=enable_lora_checkbox)] * len(LORA_OUTPUTS)
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED_RAND)
return seed
def convert_from_cv2_to_image(img: np.ndarray) -> Image:
return Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
def convert_from_image_to_cv2(img: Image) -> np.ndarray:
return cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
def draw_kps(
image_pil,
kps,
color_list=[
(255, 0, 0),
(0, 255, 0),
(0, 0, 255),
(255, 255, 0),
(255, 0, 255),
],
):
stickwidth = 4
limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]])
kps = np.array(kps)
w, h = image_pil.size
out_img = np.zeros([h, w, 3])
for i in range(len(limbSeq)):
index = limbSeq[i]
color = color_list[index[0]]
x = kps[index][:, 0]
y = kps[index][:, 1]
length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1]))
polygon = cv2.ellipse2Poly(
(int(np.mean(x)), int(np.mean(y))),
(int(length / 2), stickwidth),
int(angle),
0,
360,
1,
)
out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color)
out_img = (out_img * 0.6).astype(np.uint8)
for idx_kp, kp in enumerate(kps):
color = color_list[idx_kp]
x, y = kp
out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1)
out_img_pil = Image.fromarray(out_img.astype(np.uint8))
return out_img_pil
def resize_img(
input_image,
max_side=4096,
min_side=512,
size=None,
pad_to_max_side=False,
mode=PIL.Image.LANCZOS,
base_pixel_number=32,
exact_ratio=True
):
w, h = input_image.size
if exact_ratio:
if size is not None:
w_resize_new, h_resize_new = size
else:
ratio = max_side / max(w, h)
w_resize = round(w * ratio)
h_resize = round(h * ratio)
w_resize_new = (w_resize // base_pixel_number) * base_pixel_number
h_resize_new = (h_resize // base_pixel_number) * base_pixel_number
if w_resize_new > h_resize_new:
aspect_ratio = h / w
h_resize_new = int(round(w_resize_new * aspect_ratio / base_pixel_number) * base_pixel_number)
else:
aspect_ratio = w / h
w_resize_new = int(round(h_resize_new * aspect_ratio / base_pixel_number) * base_pixel_number)
else:
base_pixel_number = 64
if size is not None:
w_resize_new, h_resize_new = size
else:
ratio = min_side / min(h, w)
w, h = round(ratio * w), round(ratio * h)
ratio = max_side / max(h, w)
input_image = input_image.resize([round(ratio * w), round(ratio * h)], mode)
w_resize_new = (round(ratio * w) // base_pixel_number) * base_pixel_number
h_resize_new = (round(ratio * h) // base_pixel_number) * base_pixel_number
input_image = input_image.resize([w_resize_new, h_resize_new], mode)
if pad_to_max_side and size is None:
res = np.ones([max_side, max_side, 3], dtype=np.uint8) * 255
offset_x = (max_side - w_resize_new) // 2
offset_y = (max_side - h_resize_new) // 2
res[
offset_y : offset_y + h_resize_new, offset_x : offset_x + w_resize_new
] = np.array(input_image)
input_image = Image.fromarray(res)
return input_image
def apply_style(
style_name: str, positive: str, negative: str = ""
) -> Tuple[str, str]:
p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
if style_name != DEFAULT_STYLE_NAME and negative:
return p.replace("{prompt}", positive), n + ", " + negative
else:
return p.replace("{prompt}", positive), n + negative
def load_model_and_update_pipe(model_name, enable_img2img):
nonlocal pipe
if vram_gb >= 15 and pipe is not None:
del pipe
torch.cuda.empty_cache()
gc.collect()
PipeClass = StableDiffusionXLInstantIDImg2ImgPipeline if enable_img2img else StableDiffusionXLInstantIDPipeline
if model_name.endswith((".ckpt", ".safetensors")):
scheduler_kwargs = hf_hub_download(
repo_id="eniora/RealVisXL_V5.0",
subfolder="scheduler",
filename="scheduler_config.json",
)
tokenizers, text_encoders, unet, _, vae = load_models_xl(
pretrained_model_name_or_path=model_name,
scheduler_name=None,
weight_dtype=dtype,
)
scheduler = diffusers.DPMSolverMultistepScheduler.from_config(scheduler_kwargs)
pipe = PipeClass(
vae=vae,
text_encoder=text_encoders[0],
text_encoder_2=text_encoders[1],
tokenizer=tokenizers[0],
tokenizer_2=tokenizers[1],
unet=unet,
scheduler=scheduler,
controlnet=[controlnet_identitynet],
).to(device)
else:
pipe = PipeClass.from_pretrained(
model_name,
controlnet=[controlnet_identitynet],
torch_dtype=dtype,
feature_extractor=None,
).to(device)
pipe.scheduler = diffusers.DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.load_ip_adapter_instantid(face_adapter)
if vram_gb >= 15:
pipe._current_model = model_name
return pipe
def generate_image(
resize_max_side,
face_image_path,
pose_image_path,
prompt,
negative_prompt,
style_name,
prompt_replacement_value,
num_steps,
identitynet_strength_ratio,
adapter_strength_ratio,
pose_strength,
canny_strength,
depth_strength,
controlnet_selection,
guidance_scale,
seed,
scheduler,
enable_lora,
disable_lora_1,
lora_scale,
lora_selection,
disable_lora_2,
lora_scale_2,
lora_selection_2,
disable_lora_3,
lora_scale_3,
lora_selection_3,
disable_lora_4,
lora_scale_4,
lora_selection_4,
disable_lora_5,
lora_scale_5,
lora_selection_5,
disable_lora_6,
lora_scale_6,
lora_selection_6,
disable_lora_7,
lora_scale_7,
lora_selection_7,
disable_lora_8,
lora_scale_8,
lora_selection_8,
enhance_face_region,
enhance_strength,
custom_enhance_padding,
num_outputs,
model_name,
det_size_name,
file_prefix,
enable_vae_tiling,
resize_mode,
pad_to_max_side,
enable_custom_resize,
custom_resize_width,
custom_resize_height,
enable_img2img,
strength,
exact_ratio,
progress=gr.Progress(),
):
file_prefix = file_prefix.strip().translate(FILENAME_SAFE_TRANS)
file_prefix = DEFAULT_FILE_PREFIX if not file_prefix else (f"{file_prefix}_" if not file_prefix.endswith('_') else file_prefix)
nonlocal pipe
overall_start_time = time.time()
update_det_size(det_size_name)
is_img2img_pipe = isinstance(pipe, StableDiffusionXLInstantIDImg2ImgPipeline)
if (
pipe is None
or model_name != getattr(pipe, "_current_model", None)
or (enable_img2img and not is_img2img_pipe)
or (not enable_img2img and is_img2img_pipe)
):
pipe = load_model_and_update_pipe(model_name, enable_img2img)
pipe._current_model = model_name
if enable_vae_tiling:
pipe.enable_vae_tiling()
else:
pipe.disable_vae_tiling()
if enable_lora:
pipe.unload_lora_weights()
loras_to_load = []
lora_slots = [
(lora_selection, disable_lora_1, lora_scale, 1),
(lora_selection_2, disable_lora_2, lora_scale_2, 2),
(lora_selection_3, disable_lora_3, lora_scale_3, 3),
(lora_selection_4, disable_lora_4, lora_scale_4, 4),
(lora_selection_5, disable_lora_5, lora_scale_5, 5),
(lora_selection_6, disable_lora_6, lora_scale_6, 6),
(lora_selection_7, disable_lora_7, lora_scale_7, 7),
(lora_selection_8, disable_lora_8, lora_scale_8, 8),
]
for selection, disabled, scale, idx in lora_slots:
if selection and not disabled:
lora_path = os.path.join("./models/Loras", selection)
if os.path.exists(lora_path):
loras_to_load.append({"name": selection, "scale": scale})
print(f"LoRA {idx} selected: {selection} with scale {scale}")
else:
print(f"LoRA {idx} not found at {lora_path}, skipping load.")
gr.Warning(f"LoRA {idx} not found at {lora_path}. Skipping LoRA {idx}.")
if loras_to_load:
for i, lora_item in enumerate(loras_to_load):
sanitized_lora_name = lora_item['name'].replace('.safetensors', '').replace('.', '_')
adapter_name = f"lora_{i}_{sanitized_lora_name}"
pipe.load_lora_weights("./models/Loras", weight_name=lora_item["name"], adapter_name=adapter_name)
adapter_names = [f"lora_{i}_{lora_item['name'].replace('.safetensors', '').replace('.', '_')}" for i, lora_item in enumerate(loras_to_load)]
adapter_weights = [lora_item["scale"] for lora_item in loras_to_load]
pipe.set_adapters(adapter_names, adapter_weights=adapter_weights)
pipe.fuse_lora()
print(f"Successfully loaded and fused {len(loras_to_load)} LoRAs.")
else:
pipe.disable_lora()
print("No LoRAs selected or found, LoRA disabled.")
else:
pipe.disable_lora()
face_image_filename = os.path.basename(face_image_path) if face_image_path else "None"
pose_image_filename = os.path.basename(pose_image_path) if pose_image_path else "None"
with torch.no_grad():
scheduler_config = dict(pipe.scheduler.config.items())
if not controlnet_selection:
torch.cuda.empty_cache()
use_karras = "Karras" in scheduler
use_sde = "SDE" in scheduler
scheduler_split = scheduler.split("-")[0]
scheduler_class = getattr(diffusers, scheduler_split)
if "DPMSolver" in scheduler_split:
pipe.scheduler = scheduler_class.from_config(
scheduler_config,
use_karras_sigmas=use_karras,
algorithm_type="sde-dpmsolver++" if use_sde else "dpmsolver++"
)
elif scheduler_split in ["KDPM2AncestralDiscreteScheduler", "KDPM2DiscreteScheduler"]:
pipe.scheduler = scheduler_class.from_config(
scheduler_config,
use_karras_sigmas=use_karras
)
else:
pipe.scheduler = scheduler_class.from_config(scheduler_config)
if face_image_path is None:
if enable_lora:
pipe.unfuse_lora()
pipe.unload_lora_weights()
raise gr.Error(
f"Cannot find any input face image! Please upload the face image"
)
if not prompt:
prompt = " " if prompt_replacement_value == "Empty (none)" else prompt_replacement_value
prompt, negative_prompt = apply_style(style_name, prompt, negative_prompt)
face_image = load_image(face_image_path)
custom_size = None
if enable_custom_resize:
custom_size = (int(custom_resize_width), int(custom_resize_height))
resize_mode_enum = getattr(PIL.Image, resize_mode)
face_image = resize_img(face_image, size=custom_size, max_side=resize_max_side, mode=resize_mode_enum, pad_to_max_side=pad_to_max_side, exact_ratio=exact_ratio)
face_image_cv2 = convert_from_image_to_cv2(face_image)
height, width, _ = face_image_cv2.shape
face_info = app.get(face_image_cv2)
if len(face_info) == 0:
if enable_lora:
pipe.unfuse_lora()
pipe.unload_lora_weights()
raise gr.Error(
f"Unable to detect a face in the image. Please upload a different photo with a clear face."
)
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1]
face_emb = face_info["embedding"]
face_kps = draw_kps(convert_from_cv2_to_image(face_image_cv2), face_info["kps"])
img_controlnet = face_image
if pose_image_path is not None:
pose_image = load_image(pose_image_path)
pose_image = resize_img(pose_image, size=custom_size, max_side=resize_max_side, mode=resize_mode_enum, pad_to_max_side=pad_to_max_side, exact_ratio=exact_ratio)
img_controlnet = pose_image
pose_image_cv2 = convert_from_image_to_cv2(pose_image)
face_info = app.get(pose_image_cv2)
if len(face_info) == 0:
if enable_lora:
pipe.unfuse_lora()
pipe.unload_lora_weights()
raise gr.Error(
f"Cannot find any face in the reference image! Please upload another person image"
)
face_info = face_info[-1]
face_kps = draw_kps(pose_image, face_info["kps"])
width, height = face_kps.size
if enhance_face_region:
control_mask = np.zeros([height, width, 3], dtype=np.uint8)
x1, y1, x2, y2 = face_info["bbox"]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
if enhance_strength == "Balanced":
padding_ratio = 0.15
elif enhance_strength == "High":
padding_ratio = 0.3
elif enhance_strength == "Custom":
padding_ratio = custom_enhance_padding
else:
padding_ratio = 0.0
padding_x = int((x2 - x1) * padding_ratio)
padding_y = int((y2 - y1) * padding_ratio)
x1 = max(0, x1 - padding_x)
y1 = max(0, y1 - padding_y)
x2 = min(width, x2 + padding_x)
y2 = min(height, y2 + padding_y)
control_mask[y1:y2, x1:x2] = 255
control_mask = Image.fromarray(control_mask)
else:
control_mask = None
if len(controlnet_selection) > 0:
global cached_controlnet_models
for k in list(cached_controlnet_models.keys()):
if k not in controlnet_selection:
del cached_controlnet_models[k]
torch.cuda.empty_cache()
gc.collect()
controlnet_scales = {
"pose": pose_strength,
"canny": canny_strength,
"depth": depth_strength,
}
controlnet_models_to_use = []
controlnet_images = []
for s in controlnet_selection:
model = load_and_cache_controlnet_model(s)
controlnet_models_to_use.append(model)
controlnet_images.append(controlnet_map_fn[s](img_controlnet).resize((width, height)))
pipe.controlnet = MultiControlNetModel([controlnet_identitynet] + controlnet_models_to_use)
control_scales = [float(identitynet_strength_ratio)] + [controlnet_scales[s] for s in controlnet_selection]
control_images = [face_kps] + controlnet_images
else:
if cached_controlnet_models:
for key in list(cached_controlnet_models.keys()):
del cached_controlnet_models[key]
torch.cuda.empty_cache()
gc.collect()
pipe.controlnet = controlnet_identitynet
control_scales = float(identitynet_strength_ratio)
control_images = face_kps
generator = torch.Generator(device=device).manual_seed(seed)
print("Starting image generation...")
print(f"Prompt: {prompt}\nNegative Prompt: {negative_prompt}")
print(f"Detection size: {current_det_size}")
print(f"Input face image: {os.path.basename(face_image_path) if face_image_path else 'None'}")
print(f"Reference pose image: {os.path.basename(pose_image_path) if pose_image_path else 'None'}")
print(f"Steps: {num_steps}")
print(f"img2img Mode: {'Enabled' if enable_img2img else 'Disabled'}")
if enable_img2img:
print(f"img2img Denoising Strength: {strength}")
print(f"Enhance non-face region: {'True' if enhance_face_region else 'False'} ({enhance_strength}{f' | Padding: {custom_enhance_padding:.2f}' if enhance_strength == 'Custom' else ''})")
print(f"Guidance scale: {guidance_scale}")
print(f"Model: {model_name}")
print(f"Resize mode: {resize_mode}")
print(f"Pad to max side: {pad_to_max_side}")
print(f"Use custom resize: {enable_custom_resize}")
if enable_custom_resize:
print(f"Custom resize size: {custom_resize_width}x{custom_resize_height}")
print(f"ControlNet selection: {controlnet_selection} | Strengths - Pose: {pose_strength}, Canny: {canny_strength}, Depth: {depth_strength}")
print(f"IdentityNet strength: {identitynet_strength_ratio}")
print(f"Adapter strength: {adapter_strength_ratio}")
lora_info_str = "Disabled"
if enable_lora:
lora_details = []
lora_selections = [
(lora_selection, disable_lora_1, lora_scale, 1),
(lora_selection_2, disable_lora_2, lora_scale_2, 2),
(lora_selection_3, disable_lora_3, lora_scale_3, 3),
(lora_selection_4, disable_lora_4, lora_scale_4, 4),
(lora_selection_5, disable_lora_5, lora_scale_5, 5),
(lora_selection_6, disable_lora_6, lora_scale_6, 6),
(lora_selection_7, disable_lora_7, lora_scale_7, 7),
(lora_selection_8, disable_lora_8, lora_scale_8, 8),
]
for selection, disabled, scale, idx in lora_selections:
if selection:
path = os.path.join("./models/Loras", selection)
if not disabled and os.path.exists(path):
lora_details.append(f"LoRA {idx}: {selection} (Scale: {scale})")
elif disabled:
lora_details.append(f"LoRA {idx}: Manually disabled")
else:
lora_details.append(f"LoRA {idx}: {selection} (Not found)")
if lora_details:
lora_info_str = "; ".join(lora_details)
print(f"LoRA(s): {lora_info_str}")
print(f"Scheduler: {scheduler}")
print(f"Exact aspect ratio: {'Enabled' if exact_ratio else 'Disabled'}")
print(f"Max resize side: {resize_max_side}")
print(f"Image size: {width}x{height}\n")
pipe.set_ip_adapter_scale(adapter_strength_ratio)
images = []
generation_infos = []
for i in range(num_outputs):
print(f"Generating image {i + 1} of {num_outputs}...\n")
steps = max(1, int(num_steps * strength)) if enable_img2img else num_steps
step_tracker = {"last": -1, "total": 0}
is_slow_scheduler = any(x in scheduler for x in ["DPMSolverSDE", "KDPM2", "Heun"])
if is_slow_scheduler:
def gradio_callback_lambda(pipe_obj, step, timestep, callback_kwargs):
if step != step_tracker["last"]:
step_tracker["last"] = step
step_tracker["total"] += 1
est_total = steps * 2
progress(
((i / num_outputs) + (step_tracker["total"] / est_total) / num_outputs),
desc=f"Generating image {i + 1} of {num_outputs} "
f"(Step {min(step_tracker['total'] // 2, steps)}/{steps})"
)
return callback_kwargs
else:
gradio_callback_lambda = lambda pipe_obj, step, timestep, callback_kwargs: (
progress(
((i / num_outputs) + (((step + 1) / steps) / num_outputs)),
desc=f"Generating image {i + 1} of {num_outputs} (Step {step + 1}/{steps})"
),
callback_kwargs
)[1]
print(f"Seed: {seed + i}\n")
generator = torch.Generator(device=device).manual_seed(seed + i)
common_kwargs = dict(
prompt=prompt,
negative_prompt=negative_prompt,
image_embeds=face_emb,
controlnet_conditioning_scale=control_scales,
num_inference_steps=num_steps,
guidance_scale=guidance_scale,
height=height,
width=width,
generator=generator,
callback_on_step_end=gradio_callback_lambda,
)
if enable_img2img:
result = pipe(
**common_kwargs,
image=face_image,
control_image=control_images,
strength=strength,
)
else:
result = pipe(
**common_kwargs,
image=control_images,
control_mask=control_mask,
)
image = result.images[0]
images.append(image)
info_text = f"""Prompt: {prompt}
Negative Prompt: {negative_prompt}
Input Face Image: {face_image_filename}
Reference Pose Image: {pose_image_filename}
Detection size: {current_det_size}
Steps: {num_steps}
Guidance scale: {guidance_scale}
Seed: {seed + i}
Model: {model_name}
ControlNet selection: {controlnet_selection}
Max resize side: {resize_max_side}
Image size: {width}x{height}
Exact aspect ratio: {exact_ratio}
Enhance non-face region: {enhance_face_region}
Enhance region profile: {enhance_strength}
Enhance padding ratio: {custom_enhance_padding}
Resize mode: {resize_mode}
Pad to max side: {pad_to_max_side}
Use custom resize: {enable_custom_resize}
Custom resize size: {custom_resize_width}x{custom_resize_height}
img2img Strength: {strength}
img2img Mode Enabled: {enable_img2img}
IdentityNet strength: {identitynet_strength_ratio}
Adapter strength: {adapter_strength_ratio}
Pose strength: {pose_strength}
Canny strength: {canny_strength}
Depth strength: {depth_strength}
LoRA Enabled: {enable_lora}
LoRA 1 selection: {'None' if disable_lora_1 or not (enable_lora and lora_selection and os.path.exists(os.path.join('./models/Loras', lora_selection))) else lora_selection}
LoRA 1 scale: {'Disabled' if disable_lora_1 or not (enable_lora and lora_selection and os.path.exists(os.path.join('./models/Loras', lora_selection))) else lora_scale}
LoRA 2 selection: {'None' if disable_lora_2 or not (enable_lora and lora_selection_2 and os.path.exists(os.path.join('./models/Loras', lora_selection_2))) else lora_selection_2}
LoRA 2 scale: {'Disabled' if disable_lora_2 or not (enable_lora and lora_selection_2 and os.path.exists(os.path.join('./models/Loras', lora_selection_2))) else lora_scale_2}
LoRA 3 selection: {'None' if disable_lora_3 or not (enable_lora and lora_selection_3 and os.path.exists(os.path.join('./models/Loras', lora_selection_3))) else lora_selection_3}
LoRA 3 scale: {'Disabled' if disable_lora_3 or not (enable_lora and lora_selection_3 and os.path.exists(os.path.join('./models/Loras', lora_selection_3))) else lora_scale_3}
LoRA 4 selection: {'None' if disable_lora_4 or not (enable_lora and lora_selection_4 and os.path.exists(os.path.join('./models/Loras', lora_selection_4))) else lora_selection_4}
LoRA 4 scale: {'Disabled' if disable_lora_4 or not (enable_lora and lora_selection_4 and os.path.exists(os.path.join('./models/Loras', lora_selection_4))) else lora_scale_4}
LoRA 5 selection: {'None' if disable_lora_5 or not (enable_lora and lora_selection_5 and os.path.exists(os.path.join('./models/Loras', lora_selection_5))) else lora_selection_5}
LoRA 5 scale: {'Disabled' if disable_lora_5 or not (enable_lora and lora_selection_5 and os.path.exists(os.path.join('./models/Loras', lora_selection_5))) else lora_scale_5}
LoRA 6 selection: {'None' if disable_lora_6 or not (enable_lora and lora_selection_6 and os.path.exists(os.path.join('./models/Loras', lora_selection_6))) else lora_selection_6}
LoRA 6 scale: {'Disabled' if disable_lora_6 or not (enable_lora and lora_selection_6 and os.path.exists(os.path.join('./models/Loras', lora_selection_6))) else lora_scale_6}
LoRA 7 selection: {'None' if disable_lora_7 or not (enable_lora and lora_selection_7 and os.path.exists(os.path.join('./models/Loras', lora_selection_7))) else lora_selection_7}
LoRA 7 scale: {'Disabled' if disable_lora_7 or not (enable_lora and lora_selection_7 and os.path.exists(os.path.join('./models/Loras', lora_selection_7))) else lora_scale_7}
LoRA 8 selection: {'None' if disable_lora_8 or not (enable_lora and lora_selection_8 and os.path.exists(os.path.join('./models/Loras', lora_selection_8))) else lora_selection_8}
LoRA 8 scale: {'Disabled' if disable_lora_8 or not (enable_lora and lora_selection_8 and os.path.exists(os.path.join('./models/Loras', lora_selection_8))) else lora_scale_8}
Scheduler: {scheduler}"""
png_info = PIL.PngImagePlugin.PngInfo()
png_info.add_text("Generation Parameters", info_text)
generation_infos.append(png_info)
save_images([image], generation_info=[png_info], prefix=file_prefix)
print(f"(√) Finished generating image {i + 1} of {num_outputs}\n")
torch.cuda.empty_cache()
if enable_lora:
pipe.unfuse_lora()
pipe.unload_lora_weights()
gc.collect()
overall_elapsed_time = time.time() - overall_start_time
print(f"Total generation time: {overall_elapsed_time:.2f} seconds\n")
return images
article = r"""
- Upload an image with a face. For images with multiple faces, only the largest face will be detected. Ensure the face is not too small and is clearly visible without significant obstructions or blurring.
- (Optional) You can upload another image as a reference for the face pose. If you don't, the first detected face image will be used to extract facial landmarks. If you used a cropped face as main photo, it is recommended to upload a reference photo to define a new face pose.
- (Optional) You can select multiple ControlNet models to control the generation process. The default is to use the IdentityNet only. The ControlNet models include pose skeleton, canny, and depth. You can adjust the strength of each ControlNet model to control the generation process.
- Enter a text prompt, as done in normal text-to-image models.
- Click the Generate button to begin image generation.
- img2img mode imports the "pipeline_stable_diffusion_xl_instantid_img2img" pipeline, it's good to experiment with it and I got quite good results using it. It uses a lot of VRAM though (~20GB). Enhance non-face region (control_mask) has no effect on this mode and that's by design.
- Select a model to use for generation from the upper left corner dropdown. Only use SDXL and Pony. Illustrious can be loaded but isn't well supported.
- You can select a scheduler from the upper right corner dropdown. DPMSolver, KDPM2 and Euler are usually the best.
Other usage tips of InstantID:
- If you're not satisfied with the similarity, try increasing the weight of "IdentityNet Strength" and "Image adapter strength".
- If you feel that the saturation/contrast is too high, first decrease the "Image adapter strength". If it remains too high, decrease the "IdentityNet Strength".
- If you find that text control is not as expected, decrease "Image adapter strength".
- If you find that the style or generated images are not good enough, try another model.
- If you're having trouble detecting faces, try changing the "Face Detection Size" setting or try another input photo.
"""
with gr.Blocks() as gui:
with gr.Row():
with gr.Column(scale=1):
with gr.Row():
model_name = gr.Dropdown(
choices=AVAILABLE_MODELS,
value=DEFAULT_MODEL,
show_label=False,
container=False,
allow_custom_value=True,
scale=5