Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs

Afia Anjum, Maksim E. Eren, Ismael Boureima, Boian S. Alexandrov, Manish Bhattarai

August 2024

Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To address this, various parameter-efficient fine-tuning strategies, such as Low-Rank Approximation (LoRA) and Adapters, have been developed. Despite their potential, these methods often face limitations in compressibility. Specifically, LoRA struggles to scale effectively with the increasing number of trainable parameters in modern large scale LLMs. Additionally, Low-Rank Economic Tensor-Train Adaptation (LoRETTA), which utilizes tensor train decomposition, has not yet achieved the level of compression necessary for fine-tuning very large scale models with limited resources. This paper introduces Tensor Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach that extends LoRETTA with optimized tensor train (TT) decomposition integration. By eliminating Adapters and traditional LoRA-based structures, TT-LoRA achieves greater model compression without compromising downstream task performance, along with reduced inference latency and computational overhead. We conduct an exhaustive parameter search to establish benchmarks that highlight the trade-off between model compression and performance. Our results demonstrate significant compression of LLMs while maintaining comparable performance to larger models, facilitating their deployment on resource-constraint platforms.

Type

Conference paper

Publication

In IEEE Conference on Machine Learning and Applications (ICMLA 2024) with Best Paper Award, 2024

Keywords:

Tensor-train, Low Rank Approximation, Large Language Model, BERT, Compression

Citation:

Anjum, A., Eren, M.E., Boureima, I., Eren, M.E., Alexandrov, B., and Bhattarai, M.. Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs. In ICMLA ’24: 23rd IEEE International Conference on Machine Learning and Applications, Dec. 18-20, 2024, Miami, Florida, USA. 8 pages.

BibTeX:

@INPROCEEDINGS{10903446,
  author={Anjum, Afia and Eren, Maksim E. and Boureima, Ismael and Alexandrov, Boian and Bhattarai, Manish},
  booktitle={2024 International Conference on Machine Learning and Applications (ICMLA)}, 
  title={Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs}, 
  year={2024},
  volume={},
  number={},
  pages={583-590},
  keywords={Economics;Adaptation models;Sentiment analysis;Tensors;Biological system modeling;Computational modeling;Large language models;Text summarization;Machine translation;Faces;Tensor-train;Low Rank Approximation;Large Language Model;BERT;Compression},
  doi={10.1109/ICMLA61862.2024.00085}}

Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs

Abstract

Keywords:

Citation:

BibTeX:

Maksim E. Eren

Scientist