https://doi.org/10.71352/ac.55.019
Performance impact of network security features on log processing with Spark
Abstract.
Various industries maintain a large number of machines to run their production lines and services. These types of systems process
and produce massive amounts of data to provide high quality and availability for their customer services. Therefore, these systems
should constantly be inspected, to not only provide continuously the standard levels achieved but also be upgraded to keep up
with the market competition. Our aim is to examine Apache Spark and to find one of the most suitable configurations that perform
best on our challenges and can be further applied in real, live scenarios. In addition, despite that several studies in this field
were already done, none of them considers the security factor of Spark during computation when predicting run time.
The presented work entails testing Apache Spark for log processing in standalone cluster setups with a varying number of workers
on different submitted tasks. We also examine the performance impact of enabling authentication in the network communication between
cluster nodes with these setups. Our results show that increasing the number of executor nodes and simplifying the underlying
algorithm does not always influence performance in a positive manner as expected. Furthermore, securing network communication between
Spark processes increases the overall execution time of submitted jobs noticeably.
