Research - Current Projects

My research interests range from applied data mining and databases, support vector machines, to image processing for electron microscopy. For the last seven years I have developed four different successful research tracks. At least one of them crosses interdisciplinary borders involving strong collaborations with researchers from other STEM fields.

A brief summary of these research projects follows:

  • Efficient Clustering Algorithms for Real-time Streaming Data: This research project is about developing algorithms targeted towards clustering consumption patterns at the household level, using residential energy consumption data. The analysis of this largely untapped data can provide future load predictability for the utility providers, as well as the ability to deploy dynamic pricing strategies targeted at decreasing overall costs of electric energy distribution and increasing efficiency in its use. Time series frequently contain missing values, unknown values or corrupted values, which presents a big challenge. It is envisioned that through well-designed preprocessing steps, such as data smoothing and data reduction, combined with appropriate metrics, clustering methodologies can provide robust segmentation and prediction of residential energy consumption patterns.
  • Sparsity Based Image Enhancement for Electron Microscopy and Chemical Mapping: This project focuses on the development of image processing methods for denoising and increasing the spatial resolution for different microscopy modalities affected by Poisson noise, additive white Gaussian noise and blurring effects. The methods developed exploit the sparse character of this type of images to apply patch-based algorithm and compressed-sensing techniques. These techniques are particularly relevant in instances where the speed at which data needs to be acquired or the limits on the sample exposure during imaging, preclude increases in the SNR using typical approaches, such as the increase in the acquisition dwell time at each pixel in the image.
  • Mining Software Repositories: The goal of this project is to find improved methods for automatic duplicate bug report detection based on textual similarity features and binary classification. Using a set of new textual features, inspired from recent text similarity research, we trained several binary classification models. A case study was conducted on three open source systems: Eclipse, Open Office, and Mozilla to determine the effectiveness of the improved method. A comparison is also made with current state-of-the-art approaches highlighting similarities and differences. Results indicate that the accuracy of the proposed method is better than previously reported research with respect to all three open source systems.
  • Predicting Areas of Interest in Code Reading: When novices start learning programming they start by writing snippets of code based on models presented to them. It may be better to have novices spend more time reading and understanding code in the beginning while they are trying to learn. We know from previous studies conducted using an eye-tracking device that novices and experts read code differently. It is also known that people read code differently than reading normal text. Our main goal for this project is to predict eye fixations in source code stimuli.  The main benefits of this prediction are: (1) to identify the key elements in a source code snippet automatically and (2) to help novices better read code by recommending the next best place to look. We are also looking to predict task difficulty and expertise based to the same eye-tracking data.