Film buff, rock music enthusiast, amateur photographer and not-so-regular backpacker. I take a special interest in philosophy, geopolitics and modern history. I love playing with raw data .
In collaboration with the Voice Commerce team, built a knowledge base for Walmart's catalog focusing on grocery items to help in meaningful training data generation and question answering. Also implemented a fuzzy search technique for their recipe-to-ingredient use case.
Working in Innovative Database and Information Systems Research Laboratory (IDIR) under Dr. Chengkai Li.
Graded assignments, projects, tests; proctored exams and advised students for following courses:
I worked in a team for developing a CRM software that utilizes NLP, AI and visualization engines to analyze communications between parties, automate data capture, build opportunity pipeline, and highlight intelligence to improve deal management.
GPA: 3.89
Orion is a visual interface for querying ultra-heterogeneous graphs. It iteratively assists users in query graph construction by making suggestions via data mining methods. In its active mode, Orion automatically suggests top-k edges to be added to a query graph. In its passive mode, the user adds a new edge manually, and Orion suggests a ranked list of labels for the edge. Orion’s edge ranking algorithm, Random Decision Paths (RDP), makes use of co-occurring edge sets to rank candidate edges by how likely they will match the user’s query intent. Extensive user studies using Freebase demonstrated that Orion users have a 70% success rate in constructing complex query graphs, a significant improvement over the 58% success rate by the users of a baseline system that resembles existing visual query builders. Furthermore, using active mode only, the RDP algorithm was compared with several methods adapting other data mining algorithms such as random forests and naïve Bayes classifier, as well as class association rules and collaborative filtering based on singular value decomposition. On average, RDP required 40 suggestions to correctly reach a target query graph (using only its active mode of suggestion) while other methods required 1.5–4 times as many suggestions.