Dynamic Multi-Scale Quantization: A Quantization Technique for Efficient Large Language Model Compression
Abstract
Quantization techniques are crucial for reducing the computational and memory requirements of large machine learning models, particularly large language models (LLMs). Existing quantization methods often tradeoff between accuracy and efficiency, with limitations in adaptability to diverse workloads and hardware environments. This paper introduces Dynamic Multi-Scale Quantization (DMSQ), a framework combining adaptive precision scaling, per-layer calibration, and workload-aware optimization to address these challenges. We present the mathematical foundations of DMSQ, detail its implementation, and demonstrate its effectiveness through experimental evaluation. DMSQ achieves significant compression while maintaining high accuracy, making it suitable for deployment on resource-constrained devices.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ayush Bodade, Bhavesh Chaudhari, Akshat Biniwale, Sudhir Dhage

This work is licensed under a Creative Commons Attribution 4.0 International License.