In this paper, we propose a meta-learning model to hierarchically integrate individual learning and social learning schemes. This meta-learning model is incorporated into an agent-based model to show that Herbert Scarf’s famous counterexample on Walrasian stability can become stable in some cases under a non-tâtonnement process when both learning schemes are involved, a result previously obtained by Herbert Gintis. However, we find that the stability of the competitive equilibrium depends on how individuals learn—whether they are innovators (individual learners) or imitators (social learners), and their switching frequency (mobility) between the two. We show that this endogenous behavior, apart from the initial population of innovators, is mainly determined by the agents’ intensity of choice. This study grounds the Walrasian competitive equilibrium based on the view of a balanced resource allocation between exploitation and exploration. This balance, achieved through a meta-learning model, is shown to be underpinned by a behavioral/psychological characteristic.