Statistical Inference on Networks

Developing robust measurement frameworks for online discourse

This project develops rigorous statistical frameworks for analyzing network structures and relationships in online social systems. We advance fundamental methods for link prediction, community detection, and structural inference while addressing key challenges of model selection, overfitting, and generalization in network analysis.

Our work establishes principled approaches to network inference that achieve near-optimal prediction while avoiding common pitfalls of overfitting and model mis-specification.

Key methodological contributions include:

  • Ensemble link prediction combining multiple network models through stacking to achieve near-optimal performance
  • Model selection frameworks for community detection that properly balance fit and complexity
  • Overfitting and underfitting detection methods that identify when network models fail to generalize
  • Cross-validation techniques adapted for network data with dependencies between observations
  • Benchmark evaluation protocols establishing rigorous standards for comparing network inference methods

Our statistical methods have broad applications across computational social science, enabling more reliable inference about relationship formation, group structure, and information diffusion in complex online networks. This work provides the methodological foundation for empirical studies of platform dynamics and social behavior.

Stacking models for nearly optimal link prediction in complex networks Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M., & Clauset, A. (2020). Proceedings of the National Academy of Sciences, 117(38).

Link prediction is a fundamental problem in network analysis with applications ranging from recommender systems to identifying missing interactions in biological networks. This work develops a stacking ensemble approach that systematically combines diverse network models—including graph embeddings, similarity indices, and probabilistic models—to achieve near-optimal link prediction performance across a wide range of network types. Our framework demonstrates that no single method dominates across all networks, but principled ensemble methods can consistently approach optimal performance.

Statistical inference framework for analyzing network structures in online social systems.

References

2020

  1. model_stacking.png
    Stacking models for nearly optimal link prediction in complex networks
    Amir Ghasemian, Homa Hosseinmardi, Aram Galstyan, Edoardo M Airoldi, and Aaron Clauset
    Proceedings of the National Academy of Sciences, 2020

2019

  1. over_under.jpg
    Evaluating overfit and underfit in models of network community structure
    Amir Ghasemian, Homa Hosseinmardi, and Aaron Clauset
    IEEE Transactions on Knowledge and Data Engineering, 2019