Applied Data-Centric AI

Project: Prediction of Harvest Time of Apple Trees: An RNN-Based Approach

Duration: 03.2020 - 02.2022

Funding agency: CAPES, Brazil 

Description: In this project, we have designed and implemented the PredHarv model, a machine learning system that utilizes Recurrent Neural Networks (RNNs) to predict the start date of the apple harvest based on relevant weather conditions, such as temperature. The model employs a multivariate approach, incorporating time series data on phenology and meteorology, and makes predictions from the phenological phase of the beginning of flowering. The PredHarv model offers the potential to anticipate harvest dates, allowing growers to better plan their activities, reduce costs, and increase productivity. We have created a prototype of the model and conducted experiments using real datasets from agricultural institutions. Our evaluation of the model's performance using various metrics has shown that it is efficient, has good generalizability, and can significantly improve the accuracy of prediction results. 


Project: An Analysis of Privacy Enhancing Techniques to Real-world Datasets

Duration: 03.2020 - 02.2022

Industry: Research and Development, COIKOSITY

Description:In this project, we have examined the existing privacy-preserving techniques and their suitability for real-world data. It is well-known that privacy preservation techniques depend on the attributes of the data and the manner in which it is released. As such, we have conducted an investigation into the applicability of syntactic and semantic privacy preservation techniques on a dataset and evaluated their impact. Our goal was to understand the effectiveness and limitations of these techniques in safeguarding privacy while still enabling the use of sensitive data. We have also considered the various challenges and trade-offs involved in implementing these techniques in practical scenarios. Through this analysis, we hope to provide insights into the selection and deployment of appropriate privacy preservation techniques for different types of data and use cases. 


Project: Vehicle's OBD-II Data Stastical Study and Visualization

Duration: 03.2018 - 02.2020

Industry: VEStellaLab 

Description: In this project, we aim to gain a better understanding of car health patterns through the analysis and visualization of large quantities of OBD-II data. By integrating and examining various car sensor data sets and creating graphical statistics, we aim to provide insights into car performance and benchmark against competitors in the second-hand car market. Through this study, we hope to improve our understanding of car health patterns and optimize engine service performance. To achieve these goals, we have employed various visualization techniques, such as graphs, maps, and dashboards, to clearly present and interpret the data. By providing a comprehensive view of car health patterns, we hope to enhance the overall performance and reliability of car engines. 

Industrial project: Successfully completed and delivered

Project: Real-time Car Data collection, Management, Analysis and Prediction 

Duration: 03.2016 - 02.2018 

Industry: MtoV Inc.  

Description: In this project, we have collected car data from fifteen drivers over the course of one month, resulting in a total of 15,000 data points using an OnBoard-Diagnosis version II (OBD-II) scanner device. Our aim is to identify trends and patterns in the data that can inform the development of advanced intelligent vehicle systems and prevent accidents by evaluating the changes in the quality of cars over time. We have also sought to improve drivers' skills by analyzing the meta features of automotive vehicles. To achieve these goals, we have employed data science and machine learning algorithms to analyze and predict car data for driver safety and economic driving metrics in real-world scenarios. Through this project, we hope to contribute to the advancement of intelligent vehicle systems and enhance the safety and efficiency of car travel. 


Pamul Yadav, Sangsu Jung, Dhananjay Singh, "Machine learning based real-time vehicle data analysis for safe driving modeling", SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, April 2019 Pages 1355–1358. 

Industrial project: Successfully completed and delivered

Project: AVL/GPS Bus Data Study during Hajj 

Duration: 03.2016 - 02.2019 and 03. 2019 - 02. 2022

Industry: Umm Al-Qura University, Saudi Arabia 

Description: In this project, we have conducted an analysis of a large dataset of AVL (automatic vehicle location) data to understand movement patterns during the Hajj 2018 event. Using open street maps, we have identified bus routes that were highly congested as recorded by GPS. Our data sample consists of 8,134,015 records with 11 attributes that we used to classify the data based on bus numbers and calculate journey distances. By analyzing the speeds of long and short route buses and identifying congestion points in those routes, we have gained valuable insights into bus movement patterns and the factors that contribute to traffic congestion. This information has proven useful in optimizing the overall quality of bus services and improving traffic flow. 


Nouh, R.; Singh, M.; Dhananjay Singh, "SafeDrive: Hybrid Recommendation System Architecture for Early Safety Prediction Using Internet of Vehicles". Sensors 2021, 21, 3893. 

Industrial project: Successfully completed and delivered