Large Model Compression Techniques
Let's first introduce the background of the era of large model compression, starting from GPT3, the magnitude of the weight parameters of the model has gradually risen, and the requirements for hardware have become higher and higher. Taking GPT3 as an example, under FP16 precision, 325G of memory is also required, and if the specification of A100 80G is taken as an example, at least 5 sheets are needed. Therefore, for mobile embedded devices running on limited arithmetic, to ensure that the model performance is acceptable,