Optimizing Machine Learning Models: Cost-Effective Feature Selection for Enhanced Business Performance

  • radoslav Georgiev Keremidchiev TU-Sofia
Keywords: feature selection, cost-efficiency, machine learning, dimensionality reduction, operational efficiency

Abstract

In the evolving field of machine learning, businesses are often under the impression that acquiring more data will always lead to better predictive performance. However, this paper challenges this assumption by demonstrating that selecting a smaller, well-curated subset of features can be just as effective, if not more, in creating accurate models. By focusing on 30 carefully selected features, companies can achieve similar or even superior performance compared to larger feature sets, while also reducing the costs associated with data acquisition, storage, and processing. Through a series of experiments, we show how feature selection techniques such as SelectKBest can drastically reduce the dimensionality of the problem, leading to faster model training times, lower resource consumption, and improved computational efficiency. The paper also highlights the operational and economic benefits of focusing on high-value features, including enhanced model interpretability and quicker time-to-market. These findings provide actionable insights for businesses seeking to optimize their machine learning strategies without the need for expensive or extensive datasets. In conclusion, we argue that a strategic approach to feature selection not only balances cost and performance but also fosters more sustainable machine learning practices, encouraging businesses to focus on quality over quantity when it comes to data.

References

• Bellman, R. (1961). Adaptive control processes: A guided tour. Princeton University Press.
• Chawla, N. V., & Davis, D. A. (2004). Building predictive models: Exploring the relationship between classification performance and data quality. Information Systems Research, 15(1), 1-27.
• Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O'Reilly Media, Inc.
• Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
• Guyon, I., & Elisseeff, A. (2003). An introduction to feature extraction. In Feature extraction (pp. 1-25). Springer.
• Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.
• Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
• Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34.
• Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
• Guyon, I., & Elisseeff, A. (2003). An introduction to feature extraction. In Feature extraction (pp. 1-25). Springer.
• Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.
• Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
• Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34.
• Guyon, I., & Elisseeff, A. (2003). An introduction to feature extraction. In Feature extraction (pp. 1-25). Springer.
• Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
• Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Department of Computer Science, National Taiwan University.
• Loughran, M. (2012). The value of reducing features in machine learning models for business applications. International Journal of Business Analytics, 10(3), 102-113.
Published
2024-12-25
How to Cite
Keremidchiev, radoslav. (2024). Optimizing Machine Learning Models: Cost-Effective Feature Selection for Enhanced Business Performance. Vanguard Scientific Instruments in Management, 20, 223-237. Retrieved from https://vsim-journal.info/index.php?journal=vsim&page=article&op=view&path[]=531