The SM4 block cipher algorithm used in IEEE 802.11i standard is released by the China National Cryptographic Authority and is one of the most important symmetric cryptographic algorithms in China. However, whether in the round encryption or key expansion phase of the SM4 algorithm, a large number of bit operations on the registers (e.g., circular shifting) are required. These operations are not effective to encryption in scenarios with large-scale data. In traditional implementations of SM4, different operands are assigned to different words and are processed serially, which can bring redundant operations in the process of encryption and decryption. Bit-slice technology places the same bit of multiple operands into one word, which facilitates bit-level operations in parallel. Bit-slice is actually a single instruction parallel processing technology for data, hence it can be accelerated by the CPU’s multimedia instructions. In this paper, we propose a fast implementation of the SM4 algorithm using bit-slice techniques. The experiment proves that the Bit-slice based SM4 is more efficient than the original version. It increases the encryption and decryption speed of the message by an average of 80%–120%, compared with the original approach.