Statistical Inference on Networks
Developing robust measurement frameworks for online discourse
This project develops rigorous statistical frameworks for analyzing network structures and relationships in online social systems. We advance fundamental methods for link prediction, community detection, and structural inference while addressing key challenges of model selection, overfitting, and generalization in network analysis.
Key methodological contributions include:
- Ensemble link prediction combining multiple network models through stacking to achieve near-optimal performance
- Model selection frameworks for community detection that properly balance fit and complexity
- Overfitting and underfitting detection methods that identify when network models fail to generalize
- Cross-validation techniques adapted for network data with dependencies between observations
- Benchmark evaluation protocols establishing rigorous standards for comparing network inference methods
Our statistical methods have broad applications across computational social science, enabling more reliable inference about relationship formation, group structure, and information diffusion in complex online networks. This work provides the methodological foundation for empirical studies of platform dynamics and social behavior.
Featured Research
Stacking models for nearly optimal link prediction in complex networks Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M., & Clauset, A. (2020). Proceedings of the National Academy of Sciences, 117(38).
Link prediction is a fundamental problem in network analysis with applications ranging from recommender systems to identifying missing interactions in biological networks. This work develops a stacking ensemble approach that systematically combines diverse network models—including graph embeddings, similarity indices, and probabilistic models—to achieve near-optimal link prediction performance across a wide range of network types. Our framework demonstrates that no single method dominates across all networks, but principled ensemble methods can consistently approach optimal performance.