Adaptation for social companions is a crucial requirement for future applications. Personalized interaction seems to be an important factor for long-Term commitment to interact with a social robot. We present a study evaluating the feasibility of a dueling bandit learning approach for preference learning (PL) in Human-Robot Interaction (HRI). Furthermore, we explore whether the embodiment of the PL agent has an influence on the user's evaluation of the learner. We conducted a study (n=53) comparing a graphical user interface (GUI), a virtual robot and a real robot. We found no difference regarding the preference for the virtual or real robot. We used the obtained study data to compare the PL approach against a strategy that randomly selects preference rankings. The results show that that the dueling bandit PL approach can be used to learn a user's preference in HRI.