Dynamic Multi-Scale Quantization: A Quantization Technique for Efficient Large Language Model Compression

Authors

  • Ayush Bodade Student, Department of Computer Science and Engineering, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India
  • Bhavesh Chaudhari Student, Department of Computer Science and Engineering, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India
  • Akshat Biniwale Student, Department of Computer Science and Engineering, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India
  • Sudhir Dhage Dean (Administration & Quality Assurance), Department of Computer Science and Engineering, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India

Abstract

Quantization techniques are crucial for reducing the computational and memory requirements of large machine learning models, particularly large language models (LLMs). Existing quantization methods often tradeoff between accuracy and efficiency, with limitations in adaptability to diverse workloads and hardware environments. This paper introduces Dynamic Multi-Scale Quantization (DMSQ), a framework combining adaptive precision scaling, per-layer calibration, and workload-aware optimization to address these challenges. We present the mathematical foundations of DMSQ, detail its implementation, and demonstrate its effectiveness through experimental evaluation. DMSQ achieves significant compression while maintaining high accuracy, making it suitable for deployment on resource-constrained devices.

Downloads

Download data is not yet available.

Downloads

Published

16-09-2025

Issue

Section

Articles

How to Cite

[1]
A. Bodade, B. Chaudhari, A. Biniwale, and S. Dhage, “Dynamic Multi-Scale Quantization: A Quantization Technique for Efficient Large Language Model Compression”, IJRESM, vol. 8, no. 9, pp. 44–47, Sep. 2025, Accessed: Sep. 18, 2025. [Online]. Available: https://journal.ijresm.com/index.php/ijresm/article/view/3352