Phil 5.15.2023

Low-Rank Adaptation of Large Language Models (LoRA)

Low-Rank Adaptation of Large Language Models (LoRA) is a training method that accelerates the training of large models while consuming less memory. It adds pairs of rank-decomposition weight matrices (called update matrices) to existing weights, and only trains those newly added weights. This has a couple of advantages:

  • Previous pretrained weights are kept frozen so the model is not as prone to catastrophic forgetting.
  • Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
  • LoRA matrices are generally added to the attention layers of the original model. 🧨 Diffusers provides the load_attn_procs() method to load the LoRA weights into a model’s attention layers. You can control the extent to which the model is adapted toward new training images via a scale parameter.
  • The greater memory-efficiency allows you to run fine-tuning on consumer GPUs like the Tesla T4, RTX 3080 or even the RTX 2080 Ti! GPUs like the T4 are free and readily accessible in Kaggle or Google Colab notebooks.

The EU’s amended AI Act could ban American companies such as OpenAI, Amazon, Google, and IBM from providing API access to generative AI models.  The amended act, voted out of committee on Thursday, would sanction American open-source developers and software distributors, such as GitHub, if unlicensed generative models became available in Europe.  While the act includes open source exceptions for traditional machine learning models, it expressly forbids safe-harbor provisions for open source generative systems.