Online Learning of Bayesian Classifiers with Nonstationary Data Streams

peng Wu; Ning Xiong

doi:10.53941/ijndi.2023.100009

Abstract

The advancement in Internet of things and sensor technologies has enabled data to be continuously generated with a high rate, i.e., data streams. It is practically infeasible to store streaming data in a hard disk, and apply a traditional batch learning method to extract a relevant knowledge model from these data. This paper studies online incremental learning with data streams, in which one sample is processed at each time to update the existing model. For the learning target, the Bayesian classifier is adopted which is a computationally economical model of easy deployment for online processing in edges or devices. By using the individual new example, we first present an online learning algorithm to incrementally update classifier parameters in a way equivalent to the offline learning counterpart. In order to adapt to concept drifts in nonstationary environments, the proposed online learning algorithm is improved to enable recent examples to be more impactful during the sequential learning procedure. Preliminary simulation tests reveal that the improved online learning algorithm can lead to faster model adaption than the unimproved online algorithm when the data drift occurs. In case of presumed stationary data streams without drifts, the improved online algorithm is proved to be competent by performing at least as good as (sometimes, even better than) the unimproved algorithm.

References

1.
Emani, C.K.; Cullot, N.; Nicolle, C. Understandable big data: A survey. Comput. Sci. Rev., 2015, 17: 70−81.
2.
Muthukrishnan, S. Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci., 2005, 1: 117−236.
3.
Gepperth, A.; Hammer, B. Incremental learning algorithms and applications. In 24th European Symposium on Artificial Neural Networks, Bruges, Belgium, 27–29 April 2016; ESANN, 2016.
4.
Fisher, D.H. Knowledge acquisition via incremental conceptual clustering. Mach. Learn., 1987, 2: 139−172.
5.
Langley, P. Order effects in incremental learning. Learning in Humans and Machines: Towards an Interdisciplinary Learning Science. Pergamon, 1995, 136, 137.
6.
Domingos, P.; Hulten, G. Mining high-speed data streams. In Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 1 August 2000; ACM: New York, 2000; pp. 71–80. doi:10.1145/347090.347107
7.
Gama, J.; Medas, P. Learning decision trees from dynamic data streams. J. Univ. Comput. Sci., 2005, 11: 1353−1366.
8.
Rutkowski, L.; Jaworski, M.; Pietruczuk, L.; et al. The CART decision tree for mining data streams. Inf. Sci., 2014, 266: 1−15.
9.
Wang, S.; Minku, L.L.; Yao, X. Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng., 2015, 27: 1356−1368.
10.
Yavtukhovskyi, V.; Abukhader, R.; Tillaeus, N.; et al. An incremental fuzzy learning approach for online classification of data streams. In Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020), India, 15–18 December 2020; Springer: Cham, 2021; pp. 583–592. doi:10.1007/978-3-030-73689-7_56
11.
Lemos, A.; Caminhas, W.; Gomide, F. Adaptive fault detection and diagnosis using an evolving fuzzy classifier. Inf. Sci., 2013, 220: 64−85.
12.
Lughofer, E.; Buchtala, O. Reliable all-pairs evolving fuzzy classifiers. IEEE Trans. Fuzzy Syst., 2013, 21: 625−641.
13.
Lughofer, E.; Weigl, E.; Heidl, W.; et al. Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl. Soft Comput., 2015, 35: 558−582.
14.
Lughofer, E. Evolving multi-user fuzzy classifier systems integrating human uncertainty and expert knowledge. Inf. Sci., 2022, 596: 30−52.
15.
Lughofer, E. Evolving fuzzy and neuro-fuzzy systems: Fundamentals, stability, explainability, useability, and applications. In Handbook on Computer Learning and Intelligence; Angelov, P.P., Ed.; World Scientific: Singapore, 2022; pp. 133–234.
16.
Gámez, J.C.; García, D.; González, A.; et al. On the use of an incremental approach to learn fuzzy classification rules for big data problems. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, BC, Canada, 24–29 July 2016; IEEE: New York, 2016; pp. 1413–1420. doi:10.1109/FUZZ-IEEE.2016.7737855
17.
Romero-Zaliz, R.; González, A.; Pérez, R. Incremental fuzzy learning algorithms in big data problems: A study on the size of learning subsets. In 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 9–12 July 2017; IEEE: New York, 2017; pp. 1–6. doi:10.1109/FUZZ-IEEE.2017.8015671
18.
Duda, P.; Jaworski, M.; Cader, A.; et al. On training deep neural networks using a streaming approach. J. Artif. Intell. Soft Comput. Res., 2020, 10: 15−26.
19.
Hoi, S.C.H.; Sahoo, D.; Lu, J.; et al. Online learning: A comprehensive survey. Neurocomputing, 2021, 459: 249−289.
20.
Read, J.; Bifet, A.; Pfahringer, B.; et al. Batch-incremental versus instance-incremental learning in dynamic and evolving data. In 11th International Symposium on Advances in Intelligent Data Analysis XI, Helsinki, Finland, 25–27 October 2012; Springer: Berlin, 2012; pp. 313–323. doi:10.1007/978-3-642-34156-4_29
21.
Žliobaitė, I.; Pechenizkiy, M.; Gama, J. An overview of concept drift applications. In Big Data Analysis: New Algorithms for A New Society; Japkowicz, N.; Stefanowski, J., Eds.; Springer: Cham, 2016; pp. 91–114. doi:10.1007/978-3-319-26989-4_4
22.
Zhang H. The optimality of naive Bayes. In Proceedings of the 17th International FLAIRS Conference, American Association for Artificial Intelligence, Miami Beach, FL, USA; AAAI Press: Palo Alto, 2004; pp. 562−567.
23.
Ramasesh, V.V.; Lewkowycz, A.; Dyer, E. Effect of scale on catastrophic forgetting in neural networks. In The 10th International Conference on Learning Representations, 25–29 April 2022; ICLR, 2022.
24.
UC Irvine Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 10 April 2022).
25.
Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. In Encyclopedia of Database Systems; Liu, L.; Özsu, M.T., Eds.; Springer: New York, 2009; pp. 1–7. doi:10.1007/978-1-4899-7993-3_565-2

Scilight Press

Author Information

Abstract

Graphical Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us