|Published||March 31, 2023|
|Location||Birmingham, United Kingdom|
Funding: The position offered is for three and a half years full-time study. The current (2022-23) value of the award is stipend; £17,668 pa; tuition fee: £4,596 pa. Awards are usually incremented on 1 October each following year.
Eligibility: First or Upper Second Class Honours undergraduate degree and/or postgraduate degree with Distinction (or an international equivalent).
Machine Learning or more specifically, Deep Learning has made great progress in many areas including but not limited to computer vision, natural language processing, audio data processing, healthcare, science etc., demonstrating performance even better than human experts. Many of these deep learning-based techniques have been successfully applied in our real life, e.g. face recognition, background replacement in the online meeting, autonomous driving, virtual assistant, drug development and medical diagnosis etc.
However, success heavily relies on manually annotated ground-truth data from human experts to train the deep models. Obtaining high-quality labelled data requires a huge amount of manpower and financial resources and in many situations needs domain knowledge. This is rather difficult to scale but the success of deep neural networks is powered by large-scale datasets. Such limitations also restrict any developed deep model to a particular application scenario and prevents its power from being generalised or transferred to other applications.
An ability to learn without large amounts of manual annotation is crucial for generalised representation learning and could possibly be the key to general artificial intelligence - we humans rarely rely on many annotations. Self-supervised learning, which means learning by the model itself purely from the data itself while without reference to external human annotations, is a path towards the goal. To this end, self-supervised learning has shown its effectiveness in image and video understanding, medical data analysis, etc.
Although the major information acquisition comes from visual data for our humans, as we all know and experience, the world around us consists of many other different data sources for us humans to perceive and is essential to understanding the world. For instance, the audio sound, text/language, tactile sense, and sense of smell, to name a few. How to adequately leverage such information from multiple data modalities would be the key question to approaching a more natural learning framework as well as more general intelligence. Preliminary study  has shown the possibility of learning self-supervised representations from multi-modal data, but it is still under-explored with many challenging problems to address.
This project aims to study and explore the potential of learning general transferable representations from multi-modal data in a self-supervised manner. The multi-modal data here means data from multiple sensors. The target general transferable representations indicate the knowledge learned by the deep model that can be well transferred to downstream tasks. This is important as it can greatly alleviate the cost of building AI models. Multi-modal data is also beneficial for self-supervised representation learning as it provides more constraints and consistency among different modalities. The student is expected to start by working on public datasets available in the community. Deep learning models will be developed by the student to take multi-modal data as input and generate the high-quality representations as mentioned above. In a later stage, a new dataset would be constructed and novel algorithms will be developed upon that to answer the challenging questions within this topic and beyond.