Open-World Detection and Inference at Scale
Detection and prediction methods for large-scale real-world data under minimal supervision
This project develops machine learning methods for detecting and predicting diverse phenomena in large-scale real-world data under minimal supervision. We focus on open-world settings where training data is limited, noisy, and highly imbalanced—conditions typical of sociotechnical systems research.
Our methods enable inference across diverse content types from multimodal data in complex online environments where traditional supervised learning approaches fail due to data scarcity and label noise.
Key methodological contributions include:
- Weakly supervised learning frameworks for content detection with limited labeled data
- Multi-task learning architectures that leverage shared representations across related detection tasks
- Imbalanced learning techniques for identifying rare but important content (e.g., hate speech, misinformation)
- Open-world classification methods that can detect novel content categories not seen during training
- Multimodal fusion approaches combining text, images, video, and network structure
These computational methods underpin our empirical studies of online platforms, enabling large-scale measurement and analysis of phenomena that would be impossible to manually annotate. Our frameworks are designed for robustness to the inherent noise and evolving nature of real-world social media data.
Architecture for open-world detection in complex sociotechnical systems.