The principal investigators for this project are Nick Heard (Imperial College) and Patrick Rubin-Delanchy (University of Edinburgh).
Embeddings are low or moderate-dimensional vector representations of complex objects such as nodes, words, queries, documents, etc and are ubiquitous in modern statistics, data science, and AI (e.g. Large Language Models). In general, we might distinguish two types of use:
1. “Embeddings for humans”: data exploration, visualisation, hypothesis generation & testing, broader discovery science
2. “Embeddings for machines”: classification & regression, data fusion, retrieval, generative AI.
This distinction has implications for what we might consider a “good” embedding, e.g. how many dimensions, the balance of explainability versus performance, local versus global fidelity, etc.
In a vast array of problems related to e.g. the internet (e.g. cyber-security, misinformation), people & money (e.g. human-trafficking, corruption), the planet (e.g. reducing air travel), the brain (e.g. neuroscience), the data we encounter more often take the form of a dynamic network, hence the problem of dynamic network embedding.
At their simplest, dynamic network data consist of triples (i,j,t), each representing that two individuals (i and j) interacted at a certain point in time (t). Given these data, we would like to assign a vector to each node, or to each node at each point in time (i.e., a trajectory), which somehow represents their behaviour. For example, if two nodes behave similarly, their embeddings should be close; by extension, we should be able to extract groups of similar nodes using clustering techniques.
At present, we have two algorithms. The first (“Version 1”) is Unfolded Spectral Embedding and was published in NeurIPS 2021:
Ian Gallagher, Andrew Jones, Patrick Rubin-Delanchy. “Spectral embedding of dynamic networks with stability guarantees“. Advances in Neural Information Processing Systems 34
Unfolding is a construction related to tensor decomposition. Briefly, we compute the singular vectors and values of the matrix
\[\mathbf{A} := (\mathbf{A}^{(1)}, \ldots, \mathbf{A}^{(T)}),\]
where each $\mathbf{A}^{(t)}$ is an adjacency matrix. The singular components can be obtained efficiently in large real-world networks using sparse linear algebra.
In a collaboration with Microsoft Research, 2021-2022, we employed this algorithm to tackle corruption in public procurement data, e.g. raising potential “company reinventions” by comparing embedded positions. The second (“Version 2”), is Intensity Profile Projection (IPP) and was published in NeurIPS 2023:
Alexander Modell, Ian Gallagher, Emma Ceccherini, Nick Whiteley, Patrick Rubin-Delanchy. “Intensity Profile Projection: A Framework for Continuous-Time Representation Learning for Dynamic Networks“. Advances in Neural Information Processing Systems 36
This is arguably the first significant leap in this project which was purely facilitated by NeST. The method gives much more flexibility in how we model data in the time domain, e.g. using Hawkes processes, Bayesian methods, or deep learning. At the moment it is only tested on simpler approaches, such as the histogram or kernel density estimators.
We wish to develop a toolkit to better exploit dynamic network embeddings, e.g. as obtained from IPP. This could include detecting when communities split, measuring network polarisation, uncovering changes in hierarchy, and more complex phenomena. The project will develop new statistical theory using advanced mathematical techniques such as operator theory, topology and matrix perturbation theory. As we better understand what we can and can’t do with IPP, we will start moving towards a “Version 3” which, in particular, could incorporate smarter point process models and nonlinear dimension reduction tools such as autoencoders.
We will propose several dynamic network models based on a simple AR(1) network framework. The setting depicts the dynamic changes for network edges e
Learn moreNetworks that arise in fields such as biology or energy present features that challenge established modelling setups since the target function may nat
Learn more