Log Analysis Using PySpark is a comprehensive big data analytics project aimed at monitoring and securing server infrastructure by analyzing large-scale log data. The system processes and transforms raw logs using PySpark, leveraging distributed computation for efficient handling of massive datasets.
It detects recurring downtime patterns, unusual IP activity, and potential security threats, providing actionable insights for system optimization. Advanced filtering, aggregation, and anomaly detection techniques were applied to enhance accuracy, reliability, and interpretability of results.
Visualizations and summary reports were generated to support decision-making and incident response. The project highlights practical applications of data engineering, distributed computing, and cybersecurity analytics.
Tools used include Python, PySpark, Pandas, and Jupyter Notebook.





