自然风景ComfyUI Advanced Upscaler Workflow (SDXL 0.9 compatible)

v5.1B

此模型的原作者为「@bericbone」，我们致力于推动原创模型的分享和交流，因此转载了该模型、供大家非商用地交流、学习使用。请使用该模型的用户，遵守原作者所声明的使用许可。

如果您是该模型的原作者，请与我们联系，我们期待您入驻、并将第一时间把模型转移至您的账号中。如您不希望在分享该模型，我们也将遵循您的意愿在第一时间下架您的模型。

我们尊重每一位模型原创作者，更期待与每一位模型作者共同成长！

声明：若该转载模型引发知识产权纠纷或其他侵权行为，我们将立即下架模型，并不会向原作者追责。

IMPORTANT UPDATE:

I will be discontinuing work on this upscaler for now as a hires fix is not feasible for SDXL at this point in time. Let me try to explain why I think that is from my limited understanding.

The two-model setup that SDXL uses has the base model is good at generating original images from 100% noise, and the refiner is good at adding detail at 0.35%~ noise left of the image generation. The refiner is although only good at refining noise from an original image still left in creation, and will give you a blurry result if you try to add in random noise to an existing image for it to interpret. This makes it pretty useless as a hires fix.

The base model actually works better at hires upscaling but it can't add much detail, so you will always be left with less finer details than the original image. Some of the 1.5 checkpoints nowadays are extremely good at fine details without needing a secondary model, which is why my 3.0 setup works much better with SD1.5 checkpoints. Although they still arent as good as SDXL around 1024~ pixels at finer details of course.

Some possible alternatives:

Upscaling an 1024x1024 image with ESRGAN works extremely well, but will still leave you with blurry images for non-close up photos.
Since the basemodel+refiner works extremely well at generating images from 100% noise, it might be extremely good at in-painting faces. You can use a face mask detector to automatically do so. This solves most of the reason why you would even use a hires fix in the first place, which is blurry faces from full body shots. Although, I'm not sure if SDXL 0.9 is able to be used for in-painting currently.

The base model could potentially be trained to add in detail without the use of the refiner model, but I'm not sure how great of an idea that is.

SDXL is still left to be released and someone might come up with some solutions to these issues that I'm not able to as of right now. But I want to make it clear so everyone doesn't end up wasting tons of effort trying to get it to work as good as SD1.5 with no avail.

I will be leaving the workflow up for research purposes and because it still works better than anything for SD1.5 models.

___________________________

Found my upscaler useful? Support me on ko-fi https://ko-fi.com/bericbone

You nees to load these encoders into dual clip in the right order. Put them in your clip folder and reliad comfyui.

https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9/tree/main/text_encoder

https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9/tree/main/text_encoder_2

ComfyUI custom nodes needed to use this:

I recommend installing them with: https://civitai.com/models/71980?modelVersionId=106402

https://civitai.com/models/20793/was-node-suite-comfyui
https://civitai.com/models/33192/comfyui-impact-pack
https://civitai.com/models/32342/efficiency-nodes-for-comfyui
https://civitai.com/models/21558/comfyui-derfuu-math-and-modded-nodes

https://civitai.com/models/87609

An upscaling method I've designed that upscales in smaller chunks untill the full resolution is reached, as well as an option to add different prompts for the initial image generation and the upscaler.

The idea is to gradually reinterpret the data as the original image gets upscaled, making for better hand/finger structure and facial clarity for even full-body compositions, as well as extremely detailed skin.

This version is optimized for 8gb of VRAM. If the image will not fully render at 8gb VRAM, try bypassing a few of the last upscalers. If you have a lot of VRAM to work with, try adding in another 0.5 upscaler as the first upscaler. Differnent models can require very different denoise strength, so be sure to adjust those aswell. There are preview images from each upscaling step, so you can see where the denoising needs adjustment. If you want to generate images faster, make sure to unplug the latent cables from the VAE decoders before they go into the image previewers.

For those with lower VRAM, try enabling the tiled VAE and replacing the last VAE decoder with a Tiled VAE decoder. This can also allow you to do even higher resolutions, but from my experience, it comes at a loss of color accuracy. The more tiled VAE decoders, the more loss in color accuracy. There's still some color accuracy loss even for the regular VAE decoding, so if you wanna use as many as I do is up to your preferences and the checkpoints you work with. You can reduce this by using fewer upscalers.

The detail refinement step needs a very low denoise strength. try not to go above 0.2. Might need as low as 0.03 or lower. This step is to add a layer of noise that makes skin look less plastic, and to add clarity.

There's some logic behind why the scaling factors are gradually decreasing which I won't go into it too much. basically, the lower scale factor the more smaller details are being worked on in relation to the denoise strength.

If you want to lower the schedulers from karras back to normal, please be aware that you need to reduce the denoise strength drastically. different schedulers applies noise differently.

if you get red missing nodes error:
Navigate to your was-node-suite-comfyui folder
Run with powershell path/to/ComfUI/python_embeded/python.exe -m pip install -r requirements.txt

(replace the python.exe path with your own comfyui path)
ESRGAN (HIGHLY RECOMMENDED! Others might give artifacts!):
https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesr-general-wdn-x4v3.pth

VAE: https://huggingface.co/stabilityai/sd-vae-ft-mse-original/tree/main

I did some tests and it seems like the upscaling preserves more character+detail from the original generation when the clip encoder width/height stays the same, which somewhat makes sense. You can still use the multipliers but they will multiply only the first resolution and not the upscaled resolutions. I seem to generally like it at 1 even though i see 2/4x quickly becoming the norm... 1x works better for upscaling imo.