Augmented reality (AR) technology has been used to successfully improve traditional literacy. However, there has been a paradigm shift in literacy education from traditional literacy to multimodal literacy. Little research has explored how students establish effective multimodal meaning-making using AR technology. This study is an investigation of how EFL college students use different multimodal modes to communicate with others using AR technology. Participants were 52 English as a Foreign Language (EFL) students. The collected data included (a) pre-and post-administrations of a multimodal literacy survey, (b) students’ use of different modes to introduce tourist spots within the location-based AR app, and (c) students’ reflection essays. The results demonstrated that the modes which students used were categorized into visual and auditory forms. The visual mode was composed of visual effects, images, and animations, whose functions were to focus viewers’ attention on what is important, provide concrete ideas, process complex information, and promote engagement. The auditory mode consisted of background music and sound effects, which were used to arouse emotional feelings and enhance immersive experiences. The results also revealed that creating the content in a location-based AR app with the combination of different multimodal media significantly improved students’ multimodal literacy.