Center for Advanced Computer Studies
Laboratory for Internet Computing (LINC) was established in 1997, when its Co-Directors, Henry Chu and Vijay Raghavan, received funding from the State of Louisiana to create a joint research effort with Southern University at Baton Rouge. The name. LINC, and the slogan "Harnessing Distributed Heterogeneous Sources " described its initial research theme. Shortly after, LINC focused on studying the management of information related to Energy and Environmental issues (These activities were conducted under the umbrella of Energy & Environmental Information Resource Center (EEIRC)) via funding from the DoE. Then came multi-year, multi-million dollars Research support from Louisiana Governor's IT Initiative, at the beginning of the 21st century. The development of numerous industry-oriented and innovative tech entrepreneurship projects, in turn, led to first steps in establishing the Center for Visual and Decision Informatics (CVDI), an Industry University Collaborative Research Center (IUCRC) sponsored by the National Science Foundation, in 2009. After three years of struggle and preparation, the LINC co-founders, with strong support from Ramesh Kolluru and Raju Gottumukkala, kicked off CVDI in 2012 (with Drexel University as our partner). Much of the progress during 2001 - 2009 were made possible by great efforts from three research scientists: Drs. Ryan Benton, Biren Shah and Zonghuan Wu. Five years later, in 2017, we received NSF award for Phase 2 of CVDI with 5 research sites: four in the U.S. and one in Finland. In addition, our ongoing research on the understanding of protein and drug interactions is highly relevant to the pharmaceutical industry. Come and learn about our accomplishments and activities via CVDI and the exciting progress we are making in bioinformatics!
Research topics
|
Personnel |
Last updated: April 24, 2020
List of current and past students of LINC lab.
Current Students
Jing Chen She is a Computer Science PhD student at University of Louisiana at Lafayette. Her broad research interests include big data, data mining, machine learning, information retrieval and data security. She is currently doing research on data mining on social study data such as survey data. One of her work on has been published by International Joint Conference on Neural Network. She has Master of Science in Computer Science from University of Louisiana at Lafayette.
Contact Information: jing.chen@louisiana.edu, +1 (337)482-0510
Harshitha Rao Thanneru
She is a PhD Researcher in Computer Science at University of Louisiana at Lafayette. She is also working as a Full Stack Developer at CGI Technologies Inc. Lafayette, LA. Her research interests include Machine Learning and Data Mining. She is currently working on Travel Plan Recommendation System using Geo Tagged Photos.
Classical recommender systems rank single entities and provide a list that is relevant to user preference. Such a list is not very useful for Travel planning. Harshitha is currently working on an Algorithm to create Bundles of ranked lists instead of lists with Single Entities. Her aim is to be able to provide users with Bundles that are readily available for use and would meet their time, cost and other criteria.
She is using Geo-Tagged photos from various social Platforms to create paths for interesting travel destinations using density based algorithms. Geo-Tagged photos provide the means to estimate when and where people travel. Her goal is to improve relevance and diversity of Travel Plan recommendation using this research.
Contact Information: harshitha.thanneru1@louisiana.edu, +1(405)-714-4535
Sarika Kondra Our current research is a collaboration project of Computer Science and Bioinformatics in the area of Drug Development using protein and drug 3D structures. We apply the techniques of Data Mining, Machine Learning and Deep Learning for our classification and clustering models. The goal of the research is precision medicine where we would gather the patient protein data and combine it with our already trained protein-drug interaction models to precisely prescribe the drug for the patient. We developed a TSR-based method for protein 3-D structural comparison with its applications to protein classification and motif discovery within protein groups. We designed distributed flat and hierarchical classification models for protein functionality on LONI clusters.
She is also very fascinated by the latest happenings in the Natural Language Processing (NLP) research especially BERT (Bidirectional Encoder Representations from Transformers). She is interested in Text Summarization, LDA, LSA models, sentence embeddings, Semantic understanding and Visualization.
Teaching Assistant: CSCE 561 - Information Retrieval and Storage
Contact Information: c00219805@louisiana.edu, +1 (337) 806-6587
LinkedInShaaban Abbady He is a PhD candidate in computer science at University of Louisiana at Lafayette. His research interests include anomaly detection, big data, stream mining and graph data mining. Shaaban has been a member of the Center for Visual and Decision Informatics (CVDI), an NSF center for big data, for four consecutive years. He participated in four projects around data mining and visualization of big data. Shaaban helped his team to write winning proposals for projects which got funded by the board of CVDI’s sponsors. Projects resulted in publishing four academic papers in reputable journals and conferences.
Shaaban is an enthusiast of data science and has practical experience with a variety of data science tools including Python, R, Hadoop and Spark. He also has working experience in data bases, business intelligence, and software engineering. Has good knowledge of Blockchain technology and is AWS and Tableau certified.
Contact Information: shaabanmahran@hotmail.com
LinkedIn Google ScholarTitli Sarkar My broad research interest includes Information Retrieval, Data Mining, Machine Learning and Large-scale Distributed Data Analytics in Bioinformatics field.
Presently I am working on developing a novel alignment-free method for protein 3D structure analysis and comparison using a geometric concept called Triangular Spatial Relationship which considers spatial relationships among non-collinear objects in a 2D scene, which we have enhanced in 3D. We have developed a tool for researchers in bioinformatics field, located at University of Louisiana at Lafayette Computer Science department server: https://tsr3dsystem.cmix.louisiana.edu/ (developed using MVC Django framework and PostgreSQL) which helps them in better understanding of unsupervised structure-based protein clustering, motif discovery and protein-protein interactions. We are using an undirected graph-based approach to find the longest connected component from huge graphs using Spark cluster computing approach which is useful for novel motif discovery. An ongoing direction of my research applies association mining algorithms as Apriori and FP-growth to find frequent itemset which is implemented in apache Spark as part of graph-based unsupervised novel motif finding problem. Neo4j was used to create graph and finding longest paths.
I like travelling, writing, and reciting in my spare time other than research.
Tools: Python with its Machine Learning and Parallel Processing libraries is used as the major tool for writing most of the codes. We use RStudio and MATLAB for primary data analysis and visualization. We use HPC cluster and Apache SPARK to exploit the power of multiprocessing with GPGPU to accelerate the scalability of our codes.
Contact Information: c00222141@louisiana.edu
LinkedInPast Students
Mohammad Amir Sharif Our research interests include processing of large scale web usage data using machine learning, data mining techniques for recommendation or personalization. We use machine learning approach for our prediction or recommendation tasks. In order to overcome the performance bottleneck of large scale data handling, we use map-reduce paradigm. Especially we are motivated to parallelize machine learning algorithms like log-based regression, poisson regression or pair-wise learning methods using map-reduce based programming to find the necessary parameters of the hypothesis of an efficient recommendation system. More-over the sparsity of the data needs to be dealt efficiently; and temporal aspects of the preferences of users need the recommender system to be up to date. Many Map-reduce implementations have the capabilities to process the streaming data. For our recommendation systems, we can take users’ new preferences in real time and update the system quickly using map-reduce based implementations. Although in most cases this updating is done as batch process, but we can take the streaming preferences in small time-window manner and update the recommender system afterwards. For these reasons we are more interested in map-reduce based implementation of machine learning tools for recommendation systems implementations instead of other machine learning tools like Matlab or R.
Currently we are working on Yahoo! Front Page Today Module User Click Log Dataset. This dataset contains a fraction of user click log for news articles displayed in the Featured Tab of the Today Module on Yahoo! Front Page(http://www.yahoo.com). The dataset has 45,811,883 user visits to the Today Module. From each of these visits we extract the userID, pageID and clicking information. In order to build a recommender systems exploiting all these user details, we use Apache Mahout, a large scale machine learning tools, which can run in a distributed environment. Mahout can be used to implement many distributed algorithms for machine learning other than recommender systems. For large scale distributed computation, mahout uses Apache Hadoop for map-reduce based implementation; which has streaming data processing capabilities as well. Distributed Mahout Programs run on top of Hadoop. In our current work we are trying to enhance the performance of item-based collaborative recommender algorithm utilizing the page contents of each page. In item-based recommender algorithms, item similarities are calculated based on whether the similar people liked those items or not. If similar people like two pages then those two pages have more similarity value. Mahout uses co-occurrence based similarity measure to find item similarity between two items. In our algorithm we are also incorporating the page content into the page similarity measure. We will find the balanced incorporation of co-occurrence based similarity and page content based similarity for efficient recommendation. We run our map-reduced based programs in a hadoop cluster in our lab.
LinkedInElshaimaa Ali In the past and present people in the network management field, use a regular database or a Management Information Base to store and retrieve the data that is captured by network monitoring tools. Whenever the data is needed by the application built on top of the database or MIB, it is retrieved, thus the main purpose of MIB or database is, serving as a lookup table. He proposed the usage of ontology in place of regular database or a MIB. Ontology is defined as an explicit specification of conceptualization about a domain. Here the domain is network management and he defines all the concepts and properties (relations hip between the concepts) in that domain and then adds the instances to it, which is now called knowledge base. He also defines the rules and feed them into the ontology. These rules are generated from the WEKA tool and also these rules could be framed based on the input given by interviewing the domain expert (here our domain expert would be the Network Administrator). Now our ontology not only acts as the database but also as an important part of an expert system as we fed our ontology with the rules from the WEKA tool and also from the domain expert. Thus he proposes that ontology based network management system would be more beneficial than a regular database or MIB.
Tools: All programs are written in Java- we use Lucene API for indexing, and similarity matching between documents. We use R-Language for the clustering of domain documents.
Maria Bala Duggimpudi Robust ontologies are available for only very few specific domains and even those available are outdated, due to the fact that ontologies change over time and need continual maintenance, and annotation regeneration. Our Research is focused on building a unified model for the semantic annotations of web documents within a specific domain. We first, designed a Wikipedia structure-based annotation schema in the ontology language OWL, where the definition of annotation entities is based on the structure of the Wikipedia, then; we describe how the structure can be used to generate the domain vocabulary needed to build ontology for the purpose of annotation. As a step towards developing a prototype for generating domain-specific instances of this ontology, we developed knowledge-based classification method that identifies the category of a web document and a method for parsing the Wikipedia link structure to extract the domain vocabulary associated with a given set of input documents from a specific domain. Given a domain specific corpus of documents we use the link structure of Wikipedia to extract all related concepts. A weighted adjacency matrix between extracted concepts is created which is used to build clusters of related concepts. Each cluster represents highly related concepts which are candidates applying NLP processing tools for extracting named relations.
Tools: Sesame 2.3.0 API, Sesame server, Java 1.7, Eclipse 3.0, Apache Tomcat 6.0, Protege 2.3.4, Java Server pages 2.1, WEKA 3.4.19, HTML 2.0.
Contact Information: mbd9259@louisiana.edu, +1 (337) 852-6345
Google Scholar LinkedInMurat Seckin Ayhan He is a PhD candidate in Computer Science at University of Louisiana at Lafayette. His research interests lie in the span of Machine Learning and its applications. One particular example is the computerized-diagnosis of Alzheimer's disease (AD), which is a major cause of dementia. Neuroimaging procedures, such as Positron Emission Tomography (PET) and Magnetic Resonance Imaging (MRI), usually end up generating high-dimensional data. This complicates statistical analysis and modeling, resulting in high computational complexity and typically more complicated models. Furthermore, the cost of labeled data is high since the gathering process involves expensive imaging procedures and domain-experts. As a result, sample sizes are small and this is a well-recognized problem in applied machine learning.
Mr. Ayhan has investigated the potential of feature selection, dimensionality reduction and classification algorithms in order to tackle the problem above. His efforts have led to a series of publications at international conferences and journals. In his recent work, he has proposed a domain-knowledge driven feature-selection strategy that not only promotes the simplicity of the model but also maintains or improves the accuracy of diagnosis. During his studies, he exploited both open-source and commercial tools that include WEKA, MATLAB/Octave and R. He also appreciates the prevalence of Python due to its support for statistics and machine learning and acceptance in the research community. Lastly, he is a believer of life-long learning and a follower of open-courseware. He is a Courserian! He has completed a handful of courses with distinction.
Contact Information: msayhan@gmail.com, +1(337) 412-0643
Google Scholar Personal Webpage LinkedInSatya Siva Kumar Katragadda Social Media tools have profoundly changed the way people interact with one another and the world around them. Twitter and Facebook are classic examples of this phenomena, coupled with advent of mobile technology made sharing of news and thoughts simple and instant. Incidents like Flight landing on Hudson, Fort Hood shooting and presidential debates show that important events are instantly shared through social media. Public opinion of these incidents can be gauged by looking at how people respond to the published incidents.
Our research work is currently focused on identifying topic emergence in social media streams. This can be achieved by eliminating ongoing discussions or popular topics that are being talked about previously. This is usually done by looking at streams and calculating divergence using models like Kullback – Leibler divergence and divergence from random measures. Another method is to identify temporal patterns using clustering and comparing clusters over time to identify “strong clusters”.
Gehpi is a graph analysis and visualization tool, which is used in our project to cluster and identify clusters of importance. There are many built in clustering techniques that can be used along with a functionality to use our own plugins that can be separately developed.
Contact Information: satyasivakumar@gmail.com, +1(757) 513-0166
Google Scholar LinkedInSumi Singh She is a Computer Science PhD student at University of Louisiana at Lafayette. Her broad research interests include bioinformatics, data mining, machine learning, and information retrieval. She is currently doing research on computational methods of protein structure comparison and prediction. One of her joint work on deployable classifiers has been published by Springer. She has Master of Science in Computer Science from University of Louisiana at Lafayette and Master of Engineering in Biomedical Engineering from Louisiana Tech University. When not working, she likes to travel and paint.
Contact Information: forsumi2@gmail.com
Google Scholar LinkedInPlease find the below schedule for LINC meetings that happen every Friday 10 A.M. to 11 A.M in Room 108.
LINC Meetings schedule - Fall 2017
Topic | Presenter | Date | Resources |
---|---|---|---|
Kick-off meeting/
Learning User Friendly Type-Error Messages |
Dr.Raghavan/ Baijun Wu |
22-Sep-2017 | Baijun Wu |
Text Summarization using LSA | Sarika Kondra | 29-Sep-2017 | |
[TBD] | [TBD] | 06-Oct-2017 | |
[TBD] | [TBD] | 13-Oct-2017 | |
[TBD] | [TBD] | 20-Oct-2017 | |
Cancelled | Cancelled | 27-Oct-2017 | |
[TBD] | [TBD] | 03-Nov-2017 | |
[TBD] | [TBD] | 10-Nov-2017 | |
[TBD] | [TBD] | 17-Nov-2017 | |
[TBD] | [TBD] | 24-Nov-2017 |