Optimizing Machine Learning Models: Cost-Effective Feature Selection for Enhanced Business Performance
Abstract
In the evolving field of machine learning, businesses are often under the impression that acquiring more data will always lead to better predictive performance. However, this paper challenges this assumption by demonstrating that selecting a smaller, well-curated subset of features can be just as effective, if not more, in creating accurate models. By focusing on 30 carefully selected features, companies can achieve similar or even superior performance compared to larger feature sets, while also reducing the costs associated with data acquisition, storage, and processing. Through a series of experiments, we show how feature selection techniques such as SelectKBest can drastically reduce the dimensionality of the problem, leading to faster model training times, lower resource consumption, and improved computational efficiency. The paper also highlights the operational and economic benefits of focusing on high-value features, including enhanced model interpretability and quicker time-to-market. These findings provide actionable insights for businesses seeking to optimize their machine learning strategies without the need for expensive or extensive datasets. In conclusion, we argue that a strategic approach to feature selection not only balances cost and performance but also fosters more sustainable machine learning practices, encouraging businesses to focus on quality over quantity when it comes to data.
References
• Chawla, N. V., & Davis, D. A. (2004). Building predictive models: Exploring the relationship between classification performance and data quality. Information Systems Research, 15(1), 1-27.
• Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O'Reilly Media, Inc.
• Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
• Guyon, I., & Elisseeff, A. (2003). An introduction to feature extraction. In Feature extraction (pp. 1-25). Springer.
• Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.
• Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
• Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34.
• Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
• Guyon, I., & Elisseeff, A. (2003). An introduction to feature extraction. In Feature extraction (pp. 1-25). Springer.
• Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.
• Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
• Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34.
• Guyon, I., & Elisseeff, A. (2003). An introduction to feature extraction. In Feature extraction (pp. 1-25). Springer.
• Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
• Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Department of Computer Science, National Taiwan University.
• Loughran, M. (2012). The value of reducing features in machine learning models for business applications. International Journal of Business Analytics, 10(3), 102-113.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
By submitting a paper for publishing the authors hereby comply with the following provisions: 1. The authors retain the copyrights and only give the journal the right for first publication while licensing the work under Creative Commons Attribution License, which grants permissions to others to share the contribution citing this journal as first publication of the text. 2. The authors may enter separate, additional contractual relations for non-exclusive distribution of the published version of the work in this journal (e.g. to upload it in an institutional depository, or to be published in a book), given that they cite the first publication in this journal. 3. The authors are allowed and are encouraged to publish their works online (e.g. to upload it in an institutional depository, personal websites, social networks, etc.) before, during, and after the submission of the paper here, because this may lead to productive exchange, as well as earlier and larger referencing of the published works (see The Effect of Open Access).