Research

The Forward Data Lab focuses on data-- integrating, managing, searching, and mining of data-- for data everywhere, in databases, then on the web, and now all over our social universe.

"Data! Data! Data!" he cried impatiently. "I can't make bricks without clay!"

        – Sherlock Holmes, The Adventure in the Copper Beaches


Without doubt, data is an indispensable ingredient for enabling algorithmic productivity and intelligence-- and we are fortunate to immerse in a digital world with unprecedented availability of data. However, to unlock the potential of data, we are facing many barriers, and our Forward Data Lab enjoys tackling these challenges.

Our research overall aims at bridging structured and unstructured big data--- to bring structured/semantic-rich access to the myriad and massive unstructured data which accounts for most of the world's information. Therefore, our research spans across data mining, data management/databases, information retrieval, machine learning, with current efforts focusing on interactive data managementsocial media analytics, and social network mining, and entity-centric Web search and mining
  • As our objectives, we aim at developing novel systems, principled algorithms, and formal theories that ultimately deliver real world applications. 
  • As our approaches, we seek to be inspired by and learn from the data we are tackling-- i.e., we believe the key to tame big data is to learn the wisdom hidden in the large scale of the data.



http://dataspread.github.io/
DataSpread: Enabling Interactive Big Data Management. 
(2015 - Present) We aim to integrate the two disparate paradigm of accessing tabular data-- database and spreadsheet-- through their marriage to enable interactive access at the front-end to power query and storage engine at the backend. (Demo: VLDB'15)

Supported by NSF Award 1633755$1,795,429, BIGDATA: F: Bringing Interactive Data Management to Scientists, Analysts, and the Masses: A Holistic Unification of Spreadsheets and Databases. PI: Kevin C.C. Chang, and Co-PIs Karrie Karahalios and Aditya Parameswara
News.



BigSocial: Towards Big Social Data Platform for Entity-Centric and User-Aware Analytics. (2012 - Present) As we people are now connected in social networks and our voices are now heard via social media, we aim to exploit these new and vast “human sensors” prevalent in our digital society-- to listen to the whole world and make sense of it [SIGIR'12KDD'12VLDB'12ICDE'13b,VLDB'13aVLDB'13bEDBT'14WWW'14ICML'14KDD'14BigComp'15,IJCAI'15VLDBJ'15, AAAI'16ICDE'16] (Demos: ICDE'12ICDM'15

Supported by NSF Award 1619302$500,000, III: Small: Social Discovery of Users and Content in Social Media Through Similarity-Based and Graph-Based Inference of Attributes and Queries. PI: Kevin C.C. Chang. News.
Selected Publications
  • Graph-based Semi-supervised Learning: Realizing Pointwise Smoothness Probabilistically. Y. Fang, K. C.-C. Chang, and H. W. Lauw. In ICML 2014, 2014. (310/1238=25%). PDF Slides
  • User Profiling in an Ego Network: Co-profiling Attributes and Relationships. R. Li, C. Wang, and K. C.-C. Chang. In WWW 2014, pages 819-830, April 2014. (84/650 = 12.9%). PDF Slides BibTex Dataset
  • Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations. R. Li, S. Wang, H. Deng, R. Wang, and K. C.-C. Chang. In KDD 2012, 2012. PDF Slides BibTex Dataset



WISDMWeb Indexing and Search for Data Mining. (2007 - Present) The Web has gone far beyond a corpus of pages-- it contains all sorts of "stuff", can we search the Web for every "thing"- entities and their relations- that it contains?[CIDR'07,VLDB'07,WSDM'10

Supported by NSF Award 1018723$500,000, III: Small: Towards Agile Information Integration for Large Scale-- Data Aware Indexing and Search over Unstructured DataPI: Kevin C.C. Chang.
Online Demo. Entity Search (Prototype system over 500-million English pages in the ClueWeb09 corpus, for 10+ entity types, running on a PC cluster.) Example queries: 1) Google founder #person; 2) bird flu #country ; 3) high blood pressure treatment #drug ; 4) kevin c chang #email
Selected Publications
  • Unifying Learning to Rank and Domain Adaptation: Enabling Cross-Task Document Scoring. M. Zhou and K. C.-C. Chang. In KDD 2014, 2014. (151/1036 = 14.6%). PDF
  • Towards Rich Query Interpretation: Walking Back and Forth for Mining Query Templates. G. Agarwal, G. Kabra, and K. C.-C. Chang. In WWW 2010, pages 1-10, 2010. (104/743=14%). PDF Slides BibTex
  • EntityRank: Searching Entities Directly and Holistically. T. Cheng, X. Yan, and K. C.-C. Chang. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB 2007), pages 387-398, Vienna, Austria, September 2007. (91/538=16.9%). PDF Slides BibTex



MetaQuerierExploring and Integrating the Deep Web(2001 - 2007) The Web has deepened dramatically- A significant and increasing amount of information is now hidden on the "deep Web," behind the query interfaces of searchable databases, can we enable access and integrate such dynamic data? [KDD'02ICDM'02

SIGMOD'03SIGMODRecord'04SIGMOD'04KDD'04,TKDE'04CIKM'04VLDB'05CIDR'05KDD'05TODS'06CACM'07VLDB'07CIKM'08) (Demos: SIGMOD'04SIGMOD'05ICDE'05ICDE'07)


Supported by NSF Award 0313260$306,000.00, ITR: Shallow Integration over the Deep Web: A Holistic ApproachPI: Kevin C.C. Chang.


Supported by NSF Award 0133199$300,078.00, CAREER: MetaQuerier: Dynamic Ad Hoc Information Integration Across the InternetPI: Kevin C.C. Chang.

Selected Publications
  • Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. K. C.-C. Chang, B. He, and Z. Zhang. In Proceedings of the Second Conference on Innovative Data Systems Research (CIDR 2005), pages 44-55, Asilomar, Ca., January 2005. (26/86=30%). PDF Slides
  • Structured Databases on the Web: Observations and Implications. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. SIGMOD Record, 33(3):61-70, September 2004. PDF
  • Statistical Schema Matching across Web Query Interfaces. B. He and K. C.-C. Chang. In Proceedings of the 2003 ACM SIGMOD Conference (SIGMOD 2003), pages 217-228, San Diego, California, June 2003. (52/342=15%). PDF Slides



AIMSupporting Efficient Top-k Ranked Query Processing-- AIMing for top query answers. (2001 - 2007) Our goal is to support ranked queries, or top-k queries, for matching data by "soft" conditions such as similarity, relevance, or preference, in order to return best k answers. 
Selected Publications
  • Top-k Query Processing in Uncertain Databases. M. A. Soliman, I. F. Ilyas, and K. C.-C. Chang. In Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), pages 896-905, Istanbul, Turkey, April 2007. (122/659=18%). PDF
  • RankSQL: Query Algebra and Optimization for Relational Top-k Queries. C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song. In Proceedings of the 2005 ACM SIGMOD Conference (SIGMOD 2005), pages 131-142, Baltimore, Maryland, June 2005. (66/431=15%). PDF Slides
  • Minimal Probing: Supporting Expensive Predicates for Top-k Queries. K. C.-C. Chang and S.-W. Hwang. In Proceedings of the 2002 ACM SIGMOD Conference (SIGMOD 2002), pages 346-357, Madison, Wisconsin, June 2002. (42/239=18%). PDF Slides