1,4-Butynediol (BYD), an essential intermediate for fine chemicals and polymer production, is primarily synthesized via formaldehyde-acetylene reaction. Kinetic experiments were conducted in a quasi-industrial slurry bed reactor under the conditions of 55–85 °C, 0–10.5 h, and pH 5–9 to obtain the time-resolved yield of 1,4-butynediol and the conversion of formaldehyde. The experimental results revealed that pH was a critical influencing factor on reaction performance, while it can not be directly coupled into mechanistic kinetic models. Therefore, four machine learning models, i.e., random forest (RF), extremely randomized trees (Extra Trees, ET), light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) were employed to establish data-driven models that can directly capture the pH influence. The 84 experimental data points were augmented to 1023 samples by interpolation and extrapolation method, then the dataset was split into training, validation, and testing subsets in a 6:2:2 ratio. The training results demonstrated that the XGBoost model exhibited the best generalization ability and stability, achieving the highest average coefficient of determination (R2) for formaldehyde conversion (0.9847 ± 0.0022) and 1,4-butynediol yield (0.9773 ± 0.0035), and the mean absolute error for both targets was less than 0.027. Finally, the XGBoost model was coupled with Bayesian optimization to search the optimal process parameters.



