Research - Current Projects
My research interests range from applied data mining and
databases, support vector machines, to image processing for
electron microscopy. For the last seven years I have
developed four different successful research tracks. At
least one of them crosses interdisciplinary borders
involving strong collaborations with researchers from other
STEM fields.
A brief summary of these research projects follows:
- Efficient Clustering Algorithms for Real-time Streaming Data: This research project is about developing algorithms targeted towards clustering consumption patterns at the household level, using residential energy consumption data. The analysis of this largely untapped data can provide future load predictability for the utility providers, as well as the ability to deploy dynamic pricing strategies targeted at decreasing overall costs of electric energy distribution and increasing efficiency in its use. Time series frequently contain missing values, unknown values or corrupted values, which presents a big challenge. It is envisioned that through well-designed preprocessing steps, such as data smoothing and data reduction, combined with appropriate metrics, clustering methodologies can provide robust segmentation and prediction of residential energy consumption patterns.
- Sparsity Based Image Enhancement for Electron
Microscopy and Chemical Mapping: This project
focuses on the development of image processing methods for
denoising and increasing the spatial resolution for
different microscopy modalities affected by Poisson noise,
additive white Gaussian noise and blurring effects. The
methods developed exploit the sparse character of this
type of images to apply patch-based algorithm and
compressed-sensing techniques. These techniques are
particularly relevant in instances where the speed at
which data needs to be acquired or the limits on the
sample exposure during imaging, preclude increases in the
SNR using typical approaches, such as the increase in the
acquisition dwell time at each pixel in the image.
- Mining Software Repositories: The goal of this project is to find improved methods for automatic duplicate bug report detection based on textual similarity features and binary classification. Using a set of new textual features, inspired from recent text similarity research, we trained several binary classification models. A case study was conducted on three open source systems: Eclipse, Open Office, and Mozilla to determine the effectiveness of the improved method. A comparison is also made with current state-of-the-art approaches highlighting similarities and differences. Results indicate that the accuracy of the proposed method is better than previously reported research with respect to all three open source systems.
- Predicting Areas of Interest in Code Reading: When novices start learning programming they start by writing snippets of code based on models presented to them. It may be better to have novices spend more time reading and understanding code in the beginning while they are trying to learn. We know from previous studies conducted using an eye-tracking device that novices and experts read code differently. It is also known that people read code differently than reading normal text. Our main goal for this project is to predict eye fixations in source code stimuli. The main benefits of this prediction are: (1) to identify the key elements in a source code snippet automatically and (2) to help novices better read code by recommending the next best place to look. We are also looking to predict task difficulty and expertise based to the same eye-tracking data.