## Introduction

Prediction models play a vital role in establishing the relation between the variables used in the particular model and the outcomes achieved and help forecast the future of a proposed outcome. A prediction model can provide information on the variables that are determining the outcome, their strength of association with the outcome and predict the future of an outcome using their specific values. Prediction models have countless applications in diverse areas, including clinical settings, where a prediction model can help with detecting or screening high-risk subjects for asymptomatic diseases (to help prevent developing diseases with early interventions), predicting a future disease (to help facilitate patient–doctor communication based on more objective information), assisting in medical decision-making (to help both doctors and patients make an informed choice regarding treatment) and assisting healthcare services with planning and quality management.

Different methodologies can be applied to build a prediction model, which techniques can be classified broadly into two categories: mathematical/statistical modelling and computer-based modelling. Regardless of the modelling technique used, one needs to apply appropriate variable selection methods during the model building stage. Selecting appropriate variables for inclusion in a model is often considered the most important and difficult part of model building. In this paper, we will discuss what is meant by variable selection, why variable selection is important, the different methods for variable selection and their advantages and disadvantages. We have also used examples of prediction models to demonstrate how these variable selection methods are applied in model building. The concept of variable selection is heavily statistical and general readers may not be familiar with many of the concepts discussed in this paper. However, we have attempted to present a non-technical discussion of the concept in a plain language that should be accessible to readers with a basic level of statistical understanding. This paper will be helpful for those who wish to be better informed of variable selection in prediction modelling, have more meaningful conversations with biostatisticians/data analysts about their project or select an appropriate method for variable selection in model building with the advanced training information provided by our paper. Our intention is to provide readers with a basic understanding of this extremely important topic to assist them when developing a prediction model.