In this paper, we propose a highly efficient VLSI architecture for 2-D dual-mode (supporting 5/3 and 9/7 lifting-based) Symmetric Mask-based Discrete Wavelet Transform (SMDWT) to improve the critical issue of the 2-D Lifting-based Discrete Wavelet Transform (LDWT), and then obtains the benefit of low-latency reduced complexity, and low transpose memory. The SMDWT also has the advantages of reduced complexity, regular signal coding, short critical path, reduced latency time, and independent subband coding processing. The transpose memory requirement of the N×N is 9N. The architecture is based on the parallel and folding scheme processing to achieve higher hardware utilization ratio and reduce the silicon area. It is suitable for Very Large Scale Integration (VLSI) implementation and can be applied to real-time operating of computer vision applications.