1.) Generic NCNN upscaler
`ncnn` has been added as a package extra. When `ncnn` is installed, the new image processor `upscaler-ncnn` is available for generic upscaling using NCNN, and should work with models converted from ONNX format. This is included by default in the Windows installer / portable install environment that is attached to each release.
This upscaler supports tiling just as the normal `upscaler` image processor does, and essentially the same options in terms of tiling with slightly different defaults.
It does not use the ``device`` argument, but instead a combination of `use-gpu=True` and `gpu-index=N` for enabling Vulkan accelerated GPU use on a specific GPU.
By default this processor runs on the CPU.
This is because the Vulkan allocator conflicts heavily with the torch CUDA allocator used for diffusion and other image processors when they are placed on the on the same GPU, and having both allocators on the same GPU can cause hard system lockups.
You can safely use this upscaler at the same time as torch based models by running it on another GPU that torch is not going to be using.
Once you have used this processor, be aware that the process will always exit with a non-zero return code, this is due to being unable to clean up the GPU context and certain `ncnn` objects properly through `ncnn` python bindings before the process shuts down. It will technically create an access violation / segfault inside `ncnn`, I am not sure what bad behaviors this will cause on Linux, but on Windows the process exits with no side effects or hang ups other than a non-zero return code.
See: `dgenerate --image-processor-help upscaler-ncnn`
And also: [Upscaling With NCNN Upscaler Models](https://github.com/Teriks/dgenerate/tree/master?tab=readme-ov-file#upscaling-with-ncnn-upscaler-models) in the readme.
2.) Memory Management
Image processors now have size estimates which are used as a heuristic for clearing out CPU side memory belonging to the diffusion model cache, prior to them being loaded into memory. This should help prevent avoidable out of memory conditions due to an image processor model loading when the diffusion model cache is using most of the systems memory.
This size estimate is also used as a heuristic for freeing up VRAM, particularly the last called diffusion pipeline if it currently is still in VRAM.
If an image processor still runs out of memory, due to its actual execution allocating large amounts of VRAM, it will attempt to free memory and then try again, if an OOM occurs on the second try then the OOM is raised.
Diffusion invocations will now attempt to clear memory and try again in the same fashion for CUDA out of memory errors, but not for CPU side out of memory errors, which are already more easily prevented by the heuristics that are already in place.
The main current enemy of this application running for long periods of time is VRAM fragmentation, which is not avoidable with the default CUDA allocator.
The example runner script in the examples folder has been rewritten to isolate each top level folder in the examples directory to a subprocess when not running with the `--subprocess-only` flag.
The only way to clear out the memory fragmentation after running so many models of different sizes is to end the process, so each directory is isolated to a sub process to take advantage of dgenerates caching behaviors for the directory, but to avoid excessive memory fragmentation by isolating a medium sized chunk of examples to a process.
There is also now an option `--torch-debug` in the `run.py` script which if enabled will try to dump information about objects stuck in VRAM after an OOM condition, and generate a Graphviz graph of possible reference cycles. Currently I cannot find any evidence of anything sticking around after dgenerate tries to clean up VRAM.
dgenerate now sets a `PYTORCH_CUDA_ALLOC_CONF` value `max_split_size_mb` of `512` before importing torch.
It also sets `PYTORCH_CUDA_LAUNCH_BLOCKING` to `0` by default.
These can be overridden in your environment.
3. Fetch CivitAI model links with `--sub-command civitai-links`
CivitAI has made a change to their website UI (*had some sort of outage) which renders right click copying of direct API links to models no longer possible.
I have written a dgenerate sub-command that can fetch API hard links to CivitAI models on a model page and display them to you next to their model titles.
The links that this command generates can be given directly to dgenerate, or used with the `\download` directive in order to download the model from CivitAI.
You can use `dgenerate --sub-command civitai-links https://civitai.com/models/4384/dreamshaper` for example to list all available model links for that model using the CivitAI API.
You can use the `--token` argument of the sub-command to append an API token to the generated links, which is sometimes needed for downloading specific models.
You can also use this sub-command as the directive `\civitai_links` in a config / shell mode or the Console UI.
See: `dgenerate --sub-command civitai-links --help`, or `\civitai_links --help` from a config / shell mode or the Console UI.
4. Config / Shell - Environmental Variable Manipulation
You can now use the directives `\env` and `\unset_env` to manipulate environmental variables.
text
using with no args prints the entire environment
\env
you can set multiple environmental variables at once
\env MY_ENV_VAR=1 MY_ENV_VAR2=2
undefine them in the same manner
\unset_env MY_ENV_VAR MY_ENV_VAR2
See: `dgenerate --directives-help env unset_env`
5.) Config / Shell - Indirect Assignment
The config / shell language that is built into dgenerate now supports indirect assignment.
You can use a basic template expansion or environmental variable expansion to select the name of a template variable.
This now works for `\set`, `\sete`, `\setp`, and `\env`.
It also works for `\unset` and `\unset_env`
All other directives which accepted a variable name already supported this.
text
\set var_name foo
\set {{ var_name }} bar
prints bar
\print {{ foo }}
\env VAR_NAME=BAZ
\env $VAR_NAME=qux
prints qux
\print $BAZ
6.) Config / Shell - Feature Flags and Platform Detection
The config template functions `have_feature(feature_name)` and `platform()` have been added.
text
have_feature returns bool
Do we have Flax/Jax?
\print {{ have_feature('flax') }}
Do we have NCNN?
\print {{ have_feature('ncnn') }}
platform() returns platform.system() string from pythons platform module
prints: Windows, Linux, or Darwin. etc...
\print {{ platform() }}
7.) Exception handing fixes in `dgenerate.invoker`
The methods in this library module were only capable of throwing `dgenerate.DgenerateUsageError` when they should have been throwing more fine grained error types when requested to do so with `throw=True`.
8.) Config / Shell - Parsing fixes
Streaming heredoc templates discarded newlines from the end of the jinja stream chunks, resulting in hard to notice issues with jinja control structures used as top level templates, mostly when the result of the heredoc template was being interpreted by the shell.
9.) Image processor library API improvements
Image processors will now throw when you pass a PIL image that possesses a mode value that the processor can not understand.
Currently, all image processors only understand `RGB` images.
10.) Console UI updates
Removed antiquated recipes related to image upscaling in favor of `Generic Image Process` and `Generic Image Process (to directory)`
From the generic image process recipes you can just select the `upscaler` or `upscaler-ncnn` processor from a drop down and fill out its parameters to preform upscaling.
---
All image processors now expose parameters provided by their base class in the UI, such as `device`, `output-file`, `output-overwrite`, and `model-offload`.
This allows the ability to select a debug image output location with a file select dialog. This is useful if you are trying to use an image processor as a pre-processor for diffusion and need to see the image that is being passed to diffusion for debugging purposes.
The `device` argument is hidden in the UI where not applicable, such as the `Generic Image Process` recipes where the UI selects the device for the whole command instead of via an image processor URI argument.
The `device` URI argument for image processors is available when selecting pre / post processors for AI image generation from the UI as well as when using the `Insert Image Processor URI` edit feature.
---
You can now specify the `frame-start` and `frame-end` URI arguments for frame slicing when using the Image Seed URI builder UI.
---
Fixed minor syntax highlighting bugs related to indirect variable assignments.