Over 2,000 people attended the 20th ACM SIG International Conference on Knowledge Discovery and Data Mining (KDD 2014), a premier interdisciplilnary conference that brings together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data.
Best paper awards were handed out to academic and industry papers, and this year's Industry & Government award went to CSE alumna Diane Hu (pictured far right, receiving the award). Hu and her co-authors were cited for their paper, "Style in the Long Tail: Discovering Unique Interests with Latent Variable Models in Large Scale Social E-commerce." The CSE alumna (M.S. '09, Ph.D. '12 under CSE Prof. Lawrence Saul, her advisor) and her co-authors, Rob Hall and Josh Attenberg, all work at Etsy, Inc., the e-commerce website that bills itself as "the world's most vibrant" marketplace for handmade or vintage items and supplies. Etsy attracts developers with its slogan, "We believe in code as craft."
In the award-winning paper, Etsy data scientist Hu and her colleagues tackle the challenge of matching buyers to products "as the size and diversity of the marketplace increases." With over 30 million diverse listings, Etsy must deal with the problem of capturing shoppers' aesthetic preferences in order to steer them to items that fit their eclectic styles. In her talk, Hu described the methods and experiments underlying two new style-based recommendation systems on the Etsy site. One is called Latent Dirichlet Allocation (LDA). LDA discovers trending categories and styles on Etsy, which are then used to describe a user's "style" profile. Hu and her colleagues also explored hashing methods to perform fast nearest neighbor search on a map-reduce framework, in order to efficiently obtain recommendations. "These techniques have been implemented successfully at very large scale," concluded Hu, "substantially improving many key business metrics."
Knock It Off
Current CSE faculty and students were also represented on the KDD program. 5th-year Ph.D. student Matthew Der (M.S. '13, Ph.D. '15 expected) collaborated on a paper with his three advisors – Lawrence Saul, Stefan Savage and Geoffrey Voelker – called, "Knock It Off: Profiling the Online Storefronts of Counterfeit Merchandise." The team developed an automated system for classifying illegal online storefronts according to which "affiliate program" (or business) runs the store. Their approach was to extract features from the HTML source code of the Web pages; these features capture the similar underlying structure of storefronts that link to the same affiliate program. Experiments showed that the system is highly accurate in classifying the storefronts of "44 distinct affiliate programs that account, collectively, for hundreds of millions of dollars in illicit e-commerce," according to the paper.