Análisis de datos y extracción de conocimiento utilizando Big Data
Archivos
Fecha
2018-07-20
Autores
Título de la revista
ISSN de la revista
Título del volumen
Editor
Resumen
[ES]En el presente trabajo de fin de grado se ha hecho un estudio (Teórico y práctico) sobre Big Data y tecnologías de tratamiento de datos. La parte teórica se centra en una profunda descripción de Big Data, los algoritmos más utilizados en el tratamiento de datos y se profundiza en la tecnología Spark.
En la parte práctica se ha hecho una preparación del entorno de trabajo con todas las instalaciones y configuraciones necesarias de Spark y se ha creado un clúster con varios nodos. Todo ello en dos escenarios: máquinas virtuales en el ordenador portátil como modo experimental y computadores físicos donde se han hecho pruebas ejecutando un algoritmo escrito en Python que incluye técnicas de Machine Learning.
Se va a evaluar la eficacia de Spark haciendo un estudio comparativo con tiempos de ejecución y considerando la distribución de las tareas.
[EN]In this present work, it was made a study (theorical and practical) on Big Data and data processing technologies. The theorical part focuses on a detailed description of Big Data, the algorithms most used in data processing and Spark Technology. In the pratical part, a preparation of the work space has been done with all the necessary installations and configurations of Spark and it was created a cluster with several nodes. All this in two settings: virtual machines in the laptop as experimental mode and physical computers where the real tests took place by executing a Python code that includes machine learnning techniques. The effectiveness of Spark will be evaluated by making a comparative study with execution times and considering the distribuition of the tasks.
[EN]In this present work, it was made a study (theorical and practical) on Big Data and data processing technologies. The theorical part focuses on a detailed description of Big Data, the algorithms most used in data processing and Spark Technology. In the pratical part, a preparation of the work space has been done with all the necessary installations and configurations of Spark and it was created a cluster with several nodes. All this in two settings: virtual machines in the laptop as experimental mode and physical computers where the real tests took place by executing a Python code that includes machine learnning techniques. The effectiveness of Spark will be evaluated by making a comparative study with execution times and considering the distribuition of the tasks.