You are here

  1. Home


Traditional techniques to train machine learning models assume that the more data you have the more accurate your predictions will be. This is true if you assume that the dataset you are trying to model and produce predictions from has static characteristics; the characteristics from which the model is produced do not change over time. This can only be achieved if the training dataset contains a complete representation of all the data the model is likely to experience over time. The assumption tends to be that predictions will be better on any dataset, given a greater number of examples. However, it is not the case that predictions cannot be made on a smaller dataset  (Cui, 2021). Indeed, the use of big data generalises very well, but at the cost of specificity. This is of particular significance when talking about patient data  (Lau et al., 2019), and other classes of data where there are identified subsets within the data pool c.f. (Meroni et al., 2021), (Liu et al., 2020).

This project will look at developing a framework through which knowledge can be built on, and experience gained. Taking a bottom-up approach rather than the conventional top-down approach used with big data, the aim will be to build a framework where the model can continue to adjust as new contextual information becomes available. Using techniques from Machine Learning, Evolutionary Computation and Network Theory. The goal will be to build a model which can learn to categorise new pieces of information based on existing knowledge. This will require the development of a framework that enables the machine learning model to identify what it already knows and how that relates to new information, if at all. The project will seek to develop a hierarchy of understanding which mediates the acceptance of new information, including the creation of new categories as required.

Skills required

  • A good understanding of computational and algorithmic techniques, related to machine learning.
  • Strong coding skills - preferably including Java and Python
  • A sound foundation for using statistics in code.


Background reading

Kevin P Murphy, “Probabilistic Machine Learning: An Introduction”, MIT Press, 2022 - Amazon, downloadable PDF draft

David Foster, “Generative Deep Learning: Teaching Machines To Paint, Write, Compose, and Play”,2023 - Amazon

Cui, Z. (2021) ' Machine Learning and Small Data', Educational measurement, issues and practice, vol. 40, no. 4, pp. 8-12.

Lau, F., Bartle-Clar, J.A. & Bliss, G. (2019) Improving Usability, Safety and Patient Outcomes with Health Information Technology : From Research to Practice, Amsterdam, Netherlands: IOS Press.

Liu, V.R., McBride, M., Reichmanis, E., Meredith, J.C.. & Grover, M.A. (2020) 'Small Data Machine Learning: Classification and Prediction of Poly(ethylene terephthalate) Stabilizers Using Molecular Descriptors', ACS Applied Polymer Materials, , vol. 2, no. 12, pp. 5592–5601.

Meroni, W.F., Seguini, L., Kerdiles, H.. & Rembold, F. (2021) 'Yield forecasting with machine learning and small data: What gains for grains?', Agricultural and Forest Meteorology, vol. 308-309, pp. 108555–.



Dr Ian Kenny and Dr Dhouha Kbaier

Request your prospectus

Request a prospectus icon

Explore our qualifications and courses by requesting one of our prospectuses today.

Request prospectus

Are you already an OU student?

Go to StudentHome