Our EPSRC funded research (EP/V00641X/1) develops new methodology to address the issue of missing data, which is a common problem in many application areas such as sample surveys and medical research. The presence of missing values complicates analyses, and if not dealt with appropriately can result in incorrect conclusions being drawn from the data. A particularly problematic scenario is when the missing data mechanism is in part determined by some unknown variables, such as the missing values themselves. This is known as a missing not at random (MNAR) mechanism.
If missing values arise due to an MNAR mechanism, then conclusions drawn from the data will typically be biased. The project team (Professor Stefanie Biedermann, OU, Dr Robin Mitra, Cardiff University, and a postdoctoral researcher starting later in 2021), will consider scenarios where it is possible to “recover” some of the missing values through a follow up sample. The main purpose of this is to learn about the missing data mechanism and specifically test whether the MNAR assumption is valid or not. Further, the recovered data will also help to correct for the effect the missing data have on conclusions.
The research makes use of optimal experimental design techniques to decide which missing values to follow up. Essentially certain missing values might yield more information about the type of missing data mechanism than others; in addition, some values might be more likely than others to be recovered. In this way we would ensure maximum information from the recovered data is obtained. This will allow data analysts to determine whether the presence of MNAR is likely and take appropriate action.
We will collaborate with our project partners, the Office for National Statistics and NHS Blood and Transplant in the development of these methods.
The ultimate objective of this research is to develop a toolkit that allows practitioners to deal with the potential for MNAR missingness appropriately, efficiently, and in a principled fashion.