Courses and seminars

Data collection

Introduction to the main tools and procedures for obtaining structured and unstructured data, using the programming languages Python and R. The recovery of data from various sources (web pages, social networks and platforms, databases, etc.) is dealt with. The main data representation formats are dealt with, such as CSV, JSON or XML.

-->

Management and Data Processing

Introduction to tools for the creation and maintenance of data repositories that concentrate information from multiple sources, for later exploitation in Data Science projects. In particular, it focuses on the technological solutions Elasticsearch (+ Kibana for visualization) and Solr.

Introduction to Apache Spark

Introduction to the main components of the Apache Spark technology stack, as an integral solution for data processing and analysis. It includes detailed practical presentation of the main APIs within this environment for working with structured data (DataFrames, Spark SQL), data flows (Spark Streaming) and machine learning (ML/MLlib).

Scientific Programming in Python

Introduction to the tools for scientific programming in Python, based on the Anaconda distribution and use of Jupyter notebooks. Libraries for scientific programming in Python (NumPy/SciPy, Pandas), advanced visualization tools (Seaborn, Bokeh).

Reproducible data analysis with R

In this course we offer an introduction to the statistical programming language and environment R. The RStudio software for data analysis with R is introduced. Data cleaning and preparation techniques as well as basic visualization tools are introduced.

Graphical interfaces for R

Presentation of Rcommander and Rattle graphical interfaces that facilitate the use of R libraries for users with little experience in programming, for statistical analysis or implementation of data mining processes.

Cloud Computing for Data Science

Theoretical and practical presentation of the main tools for the development of Data Science projects using Cloud Computing resources. In particular, it focuses on solutions for the Amazon AWS environment, including the use of containers (Docker).

Machine Learning

The main objective of this Data Mining and Machine Learning course is to present classical methodologies and algorithms for learning from the information contained in the data, the extraction of data patterns and the prediction of relevant characteristics of the data.

Clustering with R

El principal objetivo de este curso de Clustering en R es presentar las metodologías típicas de agrupamiento o clustering, los algoritmos clásicos, jerárquicos y no jerárquicos, extensiones y las aplicaciones y casos de uso práctico en diferentes áreas de aplicación.

Text Mining and Natural Language Processing

Presentation of the main libraries and tools for Textual Data Mining and Natural Language Processing in Python and R languages. Includes practical examples of analysis of information from the web and social networks.

Advanced Data Visualization

Introduction to advanced tools for data visualization, communication and effective presentation of Data Science project results. Specific tools such as Tableau or Plotly are treated, as well as libraries in Python (Bokeh, Seaborn) and R (Shiny web interfaces) languages.

Data privacy

Introduction to the main tools, protocols and legal regulations of obligatory knowledge to guarantee an adequate level of privacy, security and control of users and identities in Data Science projects. Special emphasis is placed on anonymization and data aggregation tasks.

Meta-analysis

In this course a practical review of the main statistical methods involved in performing a meta-analysis is carried out. Different algorithms for the combination of information are studied, working with the main libraries of meta-analysis in R. Real cases in health and social sciences are presented and discussed.