Code/Datasets

Code

  • Conditional Probabilistic Constraints: Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning. V. W. Zheng and K. C.-C. Chang. In CIKM 2016, 2016. PDF BibTex Dataset Code.
  • FastPPV: Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation. F. Zhu, Y. Fang, K. C.-C. Chang, and J. Ying. PVLDB, 6(6):481-492, 2013. In VLDB 2013. PDF Slides BibTex Dataset Code.
  • SubMatch: Metagraph matching, from the paper Semantic Proximity Search on Graphs with Metagraph-based Learning. Y. Fang, W. Lin, V. W. Zheng, M. Wu, K. C.-C. Chang, and X. Li. In ICDE 2016, pages 277-288, 2016. PDF Slides BibTex Code.

Datasets

  • EIE/NER/POS Data: Datasets for Entity Information Extraction (EIE), Named Entity Recognition (NER), and Part-Of-Speech tagging (POS), for Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning. V. W. Zheng and K. C.-C. Chang. In CIKM 2016, 2016. PDF BibTex Dataset Code.
  • Twitter Data: This dataset contains 284 million following relationships, 3 million user profiles and 50 million tweets, and was collected at May 2011.
  • Graph Data I: Two real-world graphs (DBLP and query log) for our work on RoundTripRank.
  • Graph Data II: Two real-world graphs (DBLP and LiveJournal) for our work on FastPPV.
  • Cross Task Data: The datasets used in our work on Cross-Task Document Scoring.
  • Entity-centric Data: Two datasets for our work on entity-centric document filtering.
  • Ego Network Data: A dataset of Ego Networks collected from Linkedin.
  • SSL Data: Several standard semi-supervised learning datasets used in our work on graph-based smoothness.