Sign In

Fixing the "slider" LECO trainer - introducing LOBA

8

May 9, 2025

(Updated: 6 days ago)

ML Research
Fixing the "slider" LECO trainer - introducing LOBA

So conceptually, the model probably already knows whatever you want it to do. It just doesn't necessarily have the attachment with the captions.

Say for example BeatriXL is having problems; where if I type 1girl I often end up with NSFW elements. I can't simply just, train 1girl directly and expect it to only show sfw elements, without lobotomizing the damn model. I also can't simply just negative the safe, questionable, explicit, etc; because they are required for fidelity and quality.

https://github.com/p1atdev/LECO

The LECO by concept is inherently HIGHLY destructive - originally adopted to make the earlier SD1.5 "slider" concepts; where one thing extreme passes in one direction or the other.

Well, as you can see by the repo it's 2 years unpatched. I looked around and found almost no replacements here, not any realistic ones anyway. So I've decided - to reimplement the slider for SDXL. Getting this notebook working involved a series of specific library selections, specific "monkey patch" additions thanks to O3 being odd, library noise replacements, and more - without patching the actual code of the library or creating a module.

LOBA

https://github.com/AbstractEyes/mirel-tuner

I'm fond of the name "lobotomy" since we're inherently lobotomizing the AI. If you use LOBA you'll know exactly what you're doing to your AI model - introducing the mad science experiment into it. d... don't merge it... Or -O or do, y'know up to you. I'll put merge functions. Wait until it's more advanced.

Anyway.

Lets make a WAY better LECO.

Conceptually, this is a good idea by concept, but the LECO implementation is sorely lacking for the current era. The early implementation of the LECO does not support masked loss, proper vpred loss for sdxl, bucketing, MIN_SNR, gamma noise additions, additional loss type huber L1 or L2 calculations, flux slider controllers, or any of the other more modern options that seem crucial to keep a model from collapsing into itself.

I'll follow the standard LORA and full funetune formatting and options alongside the new system, producing a utilizable and cluster deployable trainer in it's later stages due to utilizing huggingface_hub and diffusers in conjunction with additional offloading procedures specialized for larger training clusters.

Standard loras will be as trainable without destruction. I'm utilizing powerful pro-tools in a very ease-of-organization method for everyone to directly deploy on colab, runpod, windows machines, linux machines, macOS and more.

Programs like this don't exist for good reason.

Usually programs like this aren't finished because the developers either lose sight, get burned out, or face some sort of technical or hardware limitation based on their concepts.

On top of the technical aspects, there's a negative stigma to unlearning. I've learned enough in the last couple years to provide careful training methodologies built into a trainer.

Well, we have all those to conquer as stepping stones to the goal; parity.

We will create training parity with the LOBA, not just with standard LORA training being completely obliterated in quality by the LOBA trainer using sd-scripts directly, or the LOBA trainer monkey-patched to support github repos like that.

I can pull this one off, because of 5 core reasons here;

  1. huggingface_hub supports everything major needed without deep technical requirements.

    • This did not exist when sd-scripts spawned, but it does now. Hard to say how long this will last.

  2. robust libraries support all the necessary back-end traits that are required, we don't need anything too deeply technical or difficult here - just a patchwork integration.

    • Many libraries simply did not exist last year, or the year before. Some trainers support them, this will support MANY.

  3. time limitations and difficulties of logistics tend to shatter programs like this. The developers are limited and the time is difficult to manage, however with programs like GPT O3, Claude, and Gemini; I can pull this off on my own with minimal mental overhead and minimal headaches - as long as I focus the AI specifically on the target goals.

    • One person doing 20 people's jobs, that's the opposite of a 20x dev. A 20x dev introduces 20x more job requirements under them to produce what they need. I'm the opposite - speed and optimization only. One person doing 20 jobs with the assistance of high-grade AI systems.

  4. hardware constraints have always been an ongoing issue with many systems. I have multiple local devices that can be utilized in tests, and the primary program will be running based on colab with the scaled up form running based on tests.

  5. The wide net policy. Generally speaking, if you use only what may or may not work, you'll end up with the outcome from what may or may not work. sd-scripts is notorious for training everything without discrimination with open-ended ideas, but the implementation is limited - but that doesn't make it any less useful. Instead of focusing on just one mule, I'll distribute focus based on the task to carefully delegate imports - then run comparative checks using multiple AI to regenerate multiple sets of requirements based on machine types. I'll then test using ai and machines themselves, but there's many elements that cannot be accounted for, and many systems that simply will skirt the clauses, so I have to be vigilant over time.

Colab LOBA trainer first, others later.

LOBA looks a lot like a LECO except y'know, not like it. We're adopting the LECO style of training while replacing a bunch of stuff around it.

Technically a LECO is just a Lora with a hat on it, so we're just preparing lora-style weights with a dash of controlnet.

HEAVY use of huggingface_hub

The majority of the systems will be implemented based on from_pretrained, and we'll unload in careful methodological ways - often considered safe and utilizable over larger scale.

Additionally, we'll implement accelerate with ulysses ring support - to enable full large-scale training with a huggingface diffusers pipeline to utilize the refactored and robust library that will result from this next generation trainer.

Additional Features

Auto-Installer

  1. updated deps to match colab cuda 122 so the entire thing doesn't take hours to monkey patch into a usable state.

  2. baseline imports with python 3.9 to 3.12 pipeline for modern integration - if the python version is supported.

  3. automatic wheel-building for cuda versions on systems, so the system can be streamlined and adjusted as times call for it in easy fashions.

Targeting

  1. exact layer name targeting with scheduling per layer

  2. block targeting and general layer type targeting with scheduling

  3. full, linear, attention, ml, and so on targeting.

  4. Will support conv layers, adding additional layers and locon support instead of just LORA.

  5. additional dtype controllers depending on the device

  6. updated quick setup system for colab, runpod, and windows

  7. deepspeed, block swap, and whatever is currently available in the other major trainers that colab will support

  8. vit-l, vit-h, and more interrogation for testing.

  9. siglip integration with more powerful summarizers like llama and with multiple versions of the T5.

  10. onnx classification, segmentation, and identification with goal targets.

  11. similarity checks with configurable weights

Additional Layers

  • additional attention heads and layers may be introduced into the LORA/LOCON which could impact performance when inferenced without conversion.

  • LOCON support; which will enable training LOCON using the subsystems for LECO in each category.

  • Additional CFG options; from attention, to guidance, to unet, to individual TE if desired.

New training concepts

U-LOBA; Careful unlearning

  • By design, this will implement masked loss and interpolation methods with teacher/student to preserve information that is associated but not directly linked to the snipped neurons.

  • It will use layer by layer interpolation when necessary.

  • Work smarter, not harder.

  • This is the traditional LECO with a lot of additional power.

RC-LOBA - region-controlled superimposition and repair

  • Specific area controls with binary mask, greyscale mask, and bounding box for more than JUST attention - but full learning and forgetting.

  • Includes timestep control and scheduling for timestep changes while training.

  • text_encoder support; allowing enabling things like 3d photorealistic backgrounds with anime characters, or overlapping multiple styles on the same image based on region control.

  • Superimposing concepts on sections of the images, focusing the entire training regiment on those sections of the images.

  • Masked loss for preventative measures on non-trained sections.

ST-LOBA - style shifting LOBA meant to fixate on shifting styles and color composition

  • these have specific imposing features

  • this will employ a kind of overlayered masking and normalization, where it takes the entire output image and applies the difference for loss in a specific utilizable way and then normalizing it carefully with masking in an intelligent way.

  • This is akin to training a form of novel ai style vibe directly into a lora form.

  • The results should be rapid.

Custom loss formulas

  • The formulas will be based entirely on the selected methodologies chosen for training, rather than arbitrarily on something from a series of aged methodologies - many of which can be simply boiled into math that the user does not need to see.

  • targeted masked loss using multiple AI detections at runtime and some baseline logic checking.

  • layer loss, interpolative loss, degradation tests, canary tests, similarity checks, and more to prepare a careful loss based on pragmatic outcome rather than random noise chance.

  • Surge loss, experimental interpolative degradation analysis based on layers using a distilled adapter for rapid interpolation. The accuracy may suffer a bit but it should be exponentially faster to inference and train.

Technical

  1. multi-lora merge prior to training if desired, one slider out.

  2. text_encoder; will be trainable alongside - targeting the text encoders of choice

  3. layer targeting; training specific layers and their learn rates

  4. masked loss; greyscale or colored mask loss

  5. shift; the gradient ropes of flow models require specific rules

  6. vpred; full support of vpred and epred style noise

  7. bucketing, MIN_SNR, MAX_SNR, min_gamma, noise, multires noise

  8. more supported schedulers and optimizers - including in-house ones that may or may not ruin your model - while the official will still exist.

Target Future Models

I plan to implement multiple new models into it for training. This is a critical piece to proper slider control.

  1. Flux1D; obviously we need Flux, defaults to Flux1D2 for training - merges lora stack to train.

  2. Flux1S; different than Flux1D

  3. Wan; will support wan training using musubi-tuner scripts, but not out of the gate.

8