Optimization of Scalable Machine Learning Pipelines for Big Data Analytics in Distributed Systems

Authors

  • Tian Qi Shopify(USA) Inc., New York, USA Author

DOI:

https://doi.org/10.70088/r0a1xh16

Keywords:

machine learning, distributed systems, big data, scalability, optimization, performance

Abstract

This paper proposes an optimization approach for machine learning pipelines in distributed systems aimed at improving scalability and performance for big data analytics. The approach addresses key challenges such as data partitioning, load balancing, resource management, and fault tolerance. Experimental results demonstrate significant improvements in throughput, latency, scalability, and resource utilization, with up to a 43% increase in throughput and a 35% reduction in resource consumption. The optimized pipeline not only performs better under increasing dataset sizes and node counts but also exhibits enhanced fault tolerance and cost efficiency. This study contributes to advancing the efficiency and effectiveness of machine learning pipelines in distributed environments, offering valuable insights for large-scale data processing and analysis.

Downloads

Published

2023-01-24

Issue

Section

Article

How to Cite

Optimization of Scalable Machine Learning Pipelines for Big Data Analytics in Distributed Systems. (2023). Insights in Computer, Signals and Systems, 1(1), 24-35. https://doi.org/10.70088/r0a1xh16