Optimization of Scalable Machine Learning Pipelines for Big Data Analytics in Distributed Systems

Tian Qi

doi:10.70088/r0a1xh16

Authors

Tian Qi Shopify（USA) Inc., New York, USA Author

DOI:

https://doi.org/10.70088/r0a1xh16

Keywords:

machine learning, distributed systems, big data, scalability, optimization, performance

Abstract

This paper proposes an optimization approach for machine learning pipelines in distributed systems aimed at improving scalability and performance for big data analytics. The approach addresses key challenges such as data partitioning, load balancing, resource management, and fault tolerance. Experimental results demonstrate significant improvements in throughput, latency, scalability, and resource utilization, with up to a 43% increase in throughput and a 35% reduction in resource consumption. The optimized pipeline not only performs better under increasing dataset sizes and node counts but also exhibits enhanced fault tolerance and cost efficiency. This study contributes to advancing the efficiency and effectiveness of machine learning pipelines in distributed environments, offering valuable insights for large-scale data processing and analysis.

Optimization of Scalable Machine Learning Pipelines for Big Data Analytics in Distributed Systems

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Information

ISSN

Indexing & Abstracting

Make a Submission