Introduction to the main tools and procedures for obtaining structured and unstructured data, using the programming languages Python and R. The recovery of data from various sources (web pages, social networks and platforms, databases, etc.) is dealt with. The main data representation formats are dealt with, such as CSV, JSON or XML.
-->Introduction to tools for the creation and maintenance of data repositories that concentrate information from multiple sources, for later exploitation in Data Science projects. In particular, it focuses on the technological solutions Elasticsearch (+ Kibana for visualization) and Solr.
Introduction to the main components of the Apache Spark technology stack, as an integral solution for data processing and analysis. It includes detailed practical presentation of the main APIs within this environment for working with structured data (DataFrames, Spark SQL), data flows (Spark Streaming) and machine learning (ML/MLlib).
Introduction to the tools for scientific programming in Python, based on the Anaconda distribution and use of Jupyter notebooks. Libraries for scientific programming in Python (NumPy/SciPy, Pandas), advanced visualization tools (Seaborn, Bokeh).
In this course we offer an introduction to the statistical programming language and environment R. The RStudio software for data analysis with R is introduced. Data cleaning and preparation techniques as well as basic visualization tools are introduced.
Presentation of Rcommander and Rattle graphical interfaces that facilitate the use of
R libraries for users with little experience in programming, for statistical
analysis or implementation of data mining processes.
Theoretical and practical presentation of the main tools for the development of Data
Science projects using Cloud Computing resources. In particular, it focuses on
solutions for the Amazon AWS environment, including the use of containers
(Docker).
The main objective of this Data Mining and Machine Learning course is to present
classical methodologies and algorithms for learning from the information contained
in the data, the extraction of data patterns and the prediction of relevant
characteristics of the data.
El principal objetivo de este curso de Clustering en R es presentar las metodologías
típicas de agrupamiento o clustering, los algoritmos clásicos, jerárquicos y no
jerárquicos, extensiones y las
aplicaciones y casos de uso práctico en diferentes áreas de aplicación.
Presentation of the main libraries and tools for Textual Data Mining and Natural Language Processing in Python and R languages. Includes practical examples of analysis of information from the web and social networks.
Introduction to advanced tools for data visualization, communication and effective presentation of Data Science project results. Specific tools such as Tableau or Plotly are treated, as well as libraries in Python (Bokeh, Seaborn) and R (Shiny web interfaces) languages.
Introduction to the main tools, protocols and legal regulations of obligatory
knowledge to guarantee an adequate level of privacy, security and control of users
and identities in Data Science projects. Special emphasis is placed on anonymization
and data aggregation tasks.
In this course a practical review of the main statistical methods involved in performing a meta-analysis is carried out. Different algorithms for the combination of information are studied, working with the main libraries of meta-analysis in R. Real cases in health and social sciences are presented and discussed.