Hello World!!

This is Imran, born and raised in the city of mosques, Dhaka, Bangladesh. I am a passionate programmer, Data Science/Machine Learning professional and a researcher. I help companies make impactful data-driven decisions through utilizing and productionalizing AI and Data Science technologies. I’m a goal-oriented seasoned Data Science and Machine Learning professional with deep affection for data and proven expertise in developing and deploying end-to-end, highly scalable ML models and services. Predicting unknowns, discovering patterns, and revealing useful insights from data excites me the most. I’m dynamic in personality and a rapid learner who is always desperate for knowledge and wisdom.

Technical Skills

Publications

Rifat, Md Rifatul Islam, and Abdullah Al Imran. "Incorporating Transformer Models for Sentiment Analysis and News Classification in Khmer." International Conference on Computational Data and Social Networks. Springer, Cham, 2021. https://doi.org/10.1007/978-3-030-91434-9_10

Abstract: In recent years, natural language modeling has achieved a major breakthrough with its sophisticated theoretical and technical advancements. Leveraging the power of deep learning, transformer models have created a disrupting impact in the domain of natural language processing. However, the benefits of such advancements are still inscribed between few highly resourced languages such as English, German, and French. Low-resourced language such as Khmer is still deprived of utilizing these advancements due to lack of technical support for this language. In this study, our objective is to apply the state-of-the-art language models within two empirical use cases such as Sentiment Analysis and News Classification in the Khmer language. To perform the classification tasks, we have employed FastText and BERT for extracting word embeddings and carried out three different type of experiments such as FastText, BERT feature-based, and BERT fine-tuning-based. A large text corpus including over 100,000 news articles has been used for pre-training the transformer model, BERT. The outcome of our experiment shows that in both of the use cases, a pre-trained and fine-tuned BERT model produces the outperforming results.

Imran, Abdullah Al, and Md Nur Amin. "Deep Bangla Authorship Attribution Using Transformer Models." International Conference on Computational Data and Social Networks. Springer, Cham, 2021. https://doi.org/10.1007/978-3-030-91434-9_11.

Abstract: Authorship attribution is one of the renowned problems in the domain of Natural Language Processing (NLP). Leveraging the state-of-the-art (SOTA) techniques of NLP such as transformer models, this problem domain has achieved a considerable advancement. However, this progress is unfortunately only bound to the well-resourced languages like English, French, and German. Under-resourced language like Bangla is yet to leverage such SOTA techniques to make a breakthrough in this domain. In this study, we address this research gap and aim to contribute to the Bangla authorship attribution problem by building highly accurate models using several SOTA variants of transformer models like mBERT, bnBERT, bnElectra, and bnRoBERTa. Using the pre-trained weights of these models we have performed fine-tuning and tackled the task of authorship attribution of 16 prominent Bangla writers. Outcomes show that our bnBERT model can classify the authors with superior accuracy of 98% and also outperform all the existing models available in the literature.

Imran, Abdullah Al, Md Shamsur Rahim, and Tanvir Ahmed. "Mining the Productivity Data of the Garment Industry." International Journal of Business Intelligence and Data Mining. Inderscience Publishers, 2021. https://doi.org/10.1504/ijbidm.2021.118183.

Abstract: The garment industry is one of the key examples of the industrial globalisation of this modern era. It is a highly labour-intensive industry with lots of manual processes. Satisfying the huge global demand for garment products is mostly dependent on the production and delivery performance of the employees in the garment manufacturing companies. So, it is highly desirable among the decision makers in the garments industry to track, analyse and predict the productivity performance of the working teams in their factories. This study explores the application of state-of-the-art data mining techniques for analysing industrial data, revealing meaningful insights and predicting the productivity performance of the working teams in a garment company. As part of our exploration, we have applied eight different data mining techniques with six evaluation metrics. Our experimental results show that the tree ensemble model and gradient boosted tree model are the best performing models in the application scenario.

Imran, Abdullah Al, and Md Nur Amin. “Loan Charge-Off Prediction Including Model Explanation for Supporting Business Decisions.” Advances in Intelligent Systems and Computing. Springer International Publishing, 2021. https://doi.org/10.1007/978-3-030-71187-0_119.

Abstract: The rapid growth of taking loans and digitizing the financial sector is increasing the rate of loan charge-offs as well as the volume of data that represents customer behavior. Nowadays, Machine Learning (ML) technology is helping financial institutions utilize this huge amount of data and build some black-box prediction models for predicting loan charge-offs with decent accuracy. Yet, the amount of risk involved in such financial decisions is very high and should not be taken only based on an opaque decision of a black-box model. In this study, we propose a system for building accurate models using interpretable state-of-the-art (SOTA) ML algorithms as well as utilizing the Explainable AI (XAI) techniques to explain individual instances for supporting business decisions.

Imran, Abdullah Al, Zaman Wahid, and Tanvir Ahmed. “BNnet: A Deep Neural Network for the Identification of Satire and Fake Bangla News.” Computational Data and Social Networks. Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-66046-8_38.

Abstract: Misleading and fake news in rapidly increasing online news portals in Bangladesh has become a major concern to both the government and public lately, as a substantial amount of incidents have taken place in different cities due to unwarranted rumors over the last couple of years. However, the overall progress of research and innovation in detecting fake and satire Bangla news is yet unsatisfactory considering the prospects it would bring to the decision-makers of Bangladesh. In this study, we have amalgamated both fake and real Bangla news from quite a pool of online news portals and applied a total of seven prominent machine learning algorithms to identify real and fake Bangla news, proposing a Deep Neural Network (DNN) architecture. Using a total of five evaluation metrics: Accuracy, Precision, Recall, F1 score, and AUC, we have discovered that DNN model yields the best result with an accuracy and AUC score of 0.90 respectively while Decision Tree performs the worst.

Parves, Abdul Bari, Abdullah Al Imran, and Md. Riazur Rahman. “Incorporating Supervised Learning Algorithms with NLP Techniques to Classify Bengali Language Forms.” Proceedings of the International Conference on Computing Advancements. ACM, January 10, 2020. https://doi.org/10.1145/3377049.3377110.

Abstract: Every language has its own root, form, and grammar, and so does Bengali. Bengali language has two core forms: "Sadhu-bhasha" and "Cholito-bhasha" which have been widely used from regular communication to literary publications. At present, Sadhu-bhasha can be only found in old books and literary publications, whereas Cholito-bhasha is mostly used everywhere. However, so many Bengali linguists are still researching on these two forms to preserve its root, understand and develop Bengali, and also extract knowledge from the historical publications which were mainly written in Sadhu-bhasha. Unfortunately, till now they do not have any digital tool that can assist their research by automatically identifying these core forms of Bengali from the large archive of Bengali literature. This study aims to build such an automatic intelligent system that can accurately identify these two language forms by harnessing the power of Natural Language Processing (NLP). In this study, we have applied advanced NLP techniques and six Supervised learning algorithms to classify "Sadhu-bhasha" and "Cholito-bhasha" from text corpora. Results of this study show that all the six models yielded very promising results, however, the Multinomial Naive Bayes outperformed all the models with 99.5% accuracy, 99.0% precision, 100% recall, 0.995 AUC score and, 0.995 F1 score. Additionally, this study also performs qualitative analysis using t-SNE algorithm to visualize the difference between Sadhu-bhasha and Cholito-bhasha.

Imran, Abdullah Al, and Md Nur Amin. “Predicting the Return of Orders in the E-Tail Industry Accompanying with Model Interpretation.” Procedia Computer Science. Elsevier BV, 2020. https://doi.org/10.1016/j.procs.2020.09.113.

Abstract: Electronic Retailing (E-tailing) is one of the most impactful technology trends of recent times. This industry has dramatically enhanced the quality of human lives allowing people to shop online while having the comfort of their homes. In developing countries like Bangladesh, this industry is still rising and creating a significant economic impact. However, there exist a lot of challenges such as the return of orders that affects the growth of an E-tailer and causes revenue losses. This study addresses this most common business challenge in the E-tail industry and performs predictive modeling using 4 different state-of-the-art data mining techniques to help the industry smoothen its curve of growth. Along with predictive modeling, this study also aims to find out the most important features that influence the return of orders.

Al Imran, Abdullah, Md. Rifatul Islam Rifat, and Rafeed Mohammad. “Enhancing the Classification Performance of Lower Back Pain Symptoms Using Genetic Algorithm-Based Feature Selection.” Proceedings of International Joint Conference on Computational Intelligence. Springer Singapore, July 4, 2019. https://doi.org/10.1007/978-981-13-7564-4_39.

Abstract: Lower Back Pain (LBP) is one of the leading causes of disability around the world that affects several important parts of the human body such as the muscles, nerves, and bones of the back. The early diagnosis and proper treatment can only prevent acute LBP from infecting into chronic LBP. The aim of this study is to enhance the classification performance of LBP by identifying the most relevant feature subset from a broader feature space of an LBP dataset. To serve the aim, we have proposed a Genetic Algorithm (GA)-based feature selection approach that has been proved to significantly improve the classification performance of LBP. For the purpose of classification, we have used seven different classification algorithms, namely Logistic Regression, Ridge Regression, Gaussian Naive Bayes, Random Forest, Decision Tree, k-Nearest Neighbors (KNN), and Support Vector Machine (SVM). After applying our proposed GA-based feature selection approach along with the base classifiers, we have obtained a significant average increment in accuracy, precision, recall, f1-score, and AUC score by 3.1%, 0.64%, 4.37%, 2.64%, and 3.83% respectively. The k-Nearest Neighbors outperforms the other models with the highest accuracy (=85.2%), precision (=89.9%), and f1 score (=88.9%).

Islam Rifat, Md Rifatul, Abdullah Al Imran, and A. S. M. Badrudduza. “Educational Performance Analytics of Undergraduate Business Students.” International Journal of Modern Education and Computer Science. MECS Publisher, July 8, 2019. https://doi.org/10.5815/ijmecs.2019.07.05.

Abstract: Educational data mining (EDM) is an emerging interdisciplinary research area concerned with analyzing and studying data from academic databases to better understand the students and the educational settings. In most of the Asian countries, it is a challenging task to perform EDM due to the diverse characteristics of the educational data. In this study, we have performed students’ educational performance prediction, pattern analysis and proposed a generalized framework to perform rigorous educational analytics. To validate our proposed framework, we have also conducted extensive experiments on a real-world dataset that has been prepared by the transcript data of the students from the Marketing department of a renowned university in Bangladesh. We have applied six state-of-the-art classification algorithms on our dataset for the prediction task where the Random Forest model outperforms the other models with accuracy 94.1%. For pattern analysis, a tree diagram has been generated from the Decision Tree model.

Islam Rifat, Md Rifatul, Abdullah Al Imran, and A. S. M. Badrudduza. “EduNet: A Deep Neural Network Approach for Predicting CGPA of Undergraduate Students.” 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). IEEE, May 2019. https://doi.org/10.1109/icasert.2019.8934616.

Abstract: Educational Data Mining (EDM) is an emerging research field concerned with the application of data mining, machine learning, and statistics in the discipline of education. Many researchers have already focused on EDM and exploring the educational data using several traditional data mining techniques to improve the educational performance of the students by extracting the concealed patterns and predicting the final outcome. In this study, we aim to propose a Deep Neural Network (DNN) based model to predict the final CGPA of the undergraduate business students with a minimal error than the traditional approaches. We have considered the performance of a decision tree model as the baseline performance. Experiments in this study have shown that our proposed DNN model can predict the CGPA with a significantly minimal error rate. To measure the performance of our model we have considered the three evaluation metrics namely Mean Squared Error (=0.008), Mean Absolute Error (=0.067), and Mean Absolute Percentage Error (=2.074). Our proposed model has successfully shown a promising prediction performance by reducing the MSE, MAE, and MAPE by 0.0146, 0.0431, and 6.043 respectively, compared to the baseline model.

Imran, Abdullah Al, Md Nur Amin, Md Rifatul Islam Rifat, and Shamprikta Mehreen. “Deep Neural Network Approach for Predicting the Productivity of Garment Employees.” 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT). IEEE, April 2019. https://doi.org/10.1109/codit.2019.8820486.

Abstract: The garment industry is one of the most dominating industries in this era of industrial globalization. It is a highly labor-intensive industry that requires a large number of human resources to produce its goods and fill up the global demand for garment products. Because of the dependency on human labor, the production of a garment company comprehensively relies on the productivity of the employees who are working in different departments of the company. A common problem in this industry is that the actual productivity of the garment employees sometimes does not meet the targeted productivity that was set for them by the authorities to meet the production goals in due time. When the productivity gap occurs, the company faces a huge loss in production. This study aims to solve this problem by predicting the actual productivity of the employees. To achieve this aim, a Deep Neural Network (DNN) model has been proposed to predict the actual productivity of the employees. The experimental results of this study have shown that the proposed model yields a promising prediction performance with a minimal Mean Absolute Error (=0.086) which is less than the baseline performance error (=0.15). Such prediction performance can indisputably help the manufacturers to set an accurate target, minimize the production loss and maximize the profit.

Rafsunjani, Siam, Rifat Sultana Safa, Abdullah Al Imran, Shamsur Rahim, and Dip Nandi. “An Empirical Comparison of Missing Value Imputation Techniques on APS Failure Prediction.” International Journal of Information Technology and Computer Science. MECS Publisher, February 8, 2019. https://doi.org/10.5815/ijitcs.2019.02.03.

Abstract: The Air Pressure System (APS) is a type of function used in heavy vehicles to assist braking and gear changing. The APS failure dataset consists of the daily operational sensor data from failed Scania trucks. The dataset is crucial to the manufacturer as it allows to isolate components which caused the failure. However, missing values and imbalanced class problems are the two most challenging limitations of this dataset to predict the cause of the failure. The prediction results can be affected by the way of handling these missing values and imbalanced class problem. In this paper, we have examined and presented the impact of five different missing value imputation techniques namely: Expectation Maximization, Mean Imputation, Soft Impute, MICE, and Iterative SVD in producing significantly better results. We have also performed an empirical comparison of their performance by applying five different classifiers namely: Naive Bayes, KNN, SVM, Random Forest, and Gradient Boosted Tree on this highly imbalanced dataset. The primary aim of this study is to observe the impact of the mentioned missing value imputation techniques in the enhancement of the prediction results, performing an empirical comparison to figure out the best classification model and imputation technique. We found that the MICE imputation and the random under-sampling techniques are the highest influential techniques for improving the prediction performance and false negative rate.

Wahid, Zaman, A. K. M. Zaidi Satter, Abdullah Al Imran, and Touhid Bhuiyan. “Predicting Absenteeism at Work Using Tree-Based Learners.” Proceedings of the 3rd International Conference on Machine Learning and Soft Computing - ICMLSC 2019. ACM Press, 2019. https://doi.org/10.1145/3310986.3310994.

Abstract: Absenteeism at workplace acts as a crucial role in demonstrating the productive and profitable capacity of a company. Thus the knowledge of absenteeism of employees' becomes the foundation for an organization in its multiple dimensions. Because the proper determination of employees' profile allows the identification of excesses of occurrences of certain morbidities. The early absenteeism research primarily focused on predicting the characteristics and the categories of diseases of employees that make them perform higher absenteeism at workplace. However, predicting the absenteeism time of employees using different machine learning classifiers is able to give the researches a new dimension in line with the intention of revealing the underlying causes and patterns of absenteeism. In this paper, we have applied 4 prominent machine learning algorithms namely Decision Tree, Gradient Boosted Tree, Random Forest, and Tree Ensemble on the absenteeism dataset of a courier company in Brazil in order to predict the absenteeism time of employees at work as well as the best classifier. Based on the 7 evaluation metrics such as True Positive, True Negative, False Positive, False Negative, Sensitivity, Specificity, and Accuracy we found that Gradient Boosted Tree produced the best result with an accuracy rate of 82% whereas Tree Ensemble performed the lowest with the accuracy rate of 79%.

Imran, Abdullah Al, Md Nur Amin, and Fatema Tuj Johora. “Classification of Chronic Kidney Disease Using Logistic Regression, Feedforward Neural Network and Wide & Deep Learning.” 2018 International Conference on Innovation in Engineering and Technology (ICIET). IEEE, December 2018. https://doi.org/10.1109/ciet.2018.8660844.

Abstract: Chronic kidney disease (CKD) is a global health burden that affects approximately 10% of the adult population in the world. It is also recognized as the top 20 causes of death worldwide. Unfortunately, there is no cure for CKD however, it is possible to slow down its progression and mollify the damage by early diagnosis of the disease. Due to a limited number of nephrologists, the early diagnosis of CKD is often not possible for most of the people. Therefore the use of modern computer-aided methods is necessary to aid the traditional CKD diagnosis system to be more efficient and accurate. In this research, our primary focus was to apply 3 modern machine learning techniques namely logistic regression, feedforward neural networks and wide & deep learning to diagnose CKD as well as finding the best performing technique by evaluating their diagnosis performance. To evaluate their performance, f1-score, precision, recall and AUC score was used for logistic regression and an additional loss score was considered for the feedforward neural networks and wide & deep model. We found the feedforward neural network as the best performing technique for CKD diagnosis with 0.99 f1-score, 0.97 precision, 0.99 recall and 0.99 AUC score. Logistic regression produced the lowest result among all and the wide & deep learning with a larger number of hidden layers and neurons found to be effective for larger datasets.

Al Imran, Abdullah, Ananya Rahman, Humayoun Kabir, and Shamsur Rahim. “The Impact of Feature Selection Techniques on the Performance of Predicting Parkinson’s Disease.” International Journal of Information Technology and Computer Science. MECS Publisher, November 8, 2018. https://doi.org/10.5815/ijitcs.2018.11.02.

Abstract: Parkinson’s Disease (PD) is one of the leading causes of death around the world. However, there is no cure for this disease yet; only treatments after early diagnosis may help to relieve the symptoms. This study aims to analyze the impact of feature selection techniques on the performance of diagnosing PD by incorporating different data mining techniques. To accomplish this task, identifying the best feature selection approach was the primary focus. In this paper, the authors had applied five feature selection techniques namely: Gain Ratio, Kruskal-Wallis Test, Random Forest Variable Importance, RELIEF and Symmetrical Uncertainty along with four classification algorithms (K-Nearest Neighbor, Logistic Regression, Random forest, and Support Vector machine) on the PD dataset collected from the UCI Machine Learning repository. The result of this study was obtained by taking the four different subsets (Top 5, 10, 15, and 20 features) from each feature selection approach and applying the classifiers. The obtained result showed that in terms of accuracy, Random Forest Variable Importance, Gain Ratio, and Kruskal-Wallis Test techniques generated the highest 89% score. On the other hand, in terms of sensitivity, Gain Ratio and Kruskal-Walis Test approaches produced the highest 97% score. The findings of this research clearly indicated the impact of feature selection techniques on predicting PD and our applied methods outperformed the state-of-the-art performance.

How to reach me?