Secondments 23-26: From Faculty of Pharmacy – University of Belgrade to Fondazione Bruno Kessler

|
|
Secondments 23-26: From Faculty of Pharmacy – University of Belgrade to Fondazione Bruno Kessler

“I spent two months (from July 2nd  to August 31st, 2024) at the Institute for Artificial Intelligence at Fondazione Bruno Kessler (FBK) in Trento, Italy. Our supervisor was Dr. Marco Chierici from the Data Science for Health (DSH) Research Unit at FBK. During this time, we analyzed several articles from Dr. Chierici’s group and engaged in discussions about the various aspects of the machine learning methods they implemented. Dr. Chierici provided us with several lectures on machine learning procedures and techniques, along with an introductory lecture on the Python programming language. He also guided us on various resources and online courses to enhance our Python skills, particularly focusing on data analysis and statistical applications using this programming language.

During my stay, I completed a basic Python programming course introduced by Dr. Chierici. We also discussed the potential to develop algorithms for complex statistical analyses using Python’s predefined statistical method libraries. As a result of the knowledge I gained at FBK, I have applied for an advanced programming course in Belgrade.

Beyond my professional experience, I had the opportunity to explore the beautiful town of Trento and its surroundings, visiting many lakes, towns, and villages, which allowed me to immerse myself in Italian culture and lifestyle. I also formed strong friendships at FBK, including with a Serbian colleague, IT engineer Marina Andric, who became a valuable connection during my stay.”

Dr Jelena Kotur- Stevuljević, Full Professor at the Faculty of Pharmacy – University of Belgrade

“From July 2 to August 31, 2024, I was at the Institute for Artificial Intelligence of the Fondazione Bruno Kessler (FBK) in Trento, Italy. Our supervisor was Dr. Marco Chierici from the Data Science for Health (DSH) research unit at FBK. Dr. Chierici gave us several lectures on machine learning methods and techniques as well as an introductory lecture on the Python programming language. He also pointed us to various resources and online courses to improve our Python skills, especially in the area of data analysis and statistical applications using this programming language.

We also discussed one of our manuscripts that had certain limitations and developed ideas on how it could be improved. This conversation prompted us to plan a joint master’s thesis for an Italian master’s student who will perform complex analysis with a database from Serbian research. We also explored the possibility of a visit by Dr. Chierici to Serbia to teach Python programming and machine learning methods to PhD students at the Faculty of Pharmacy, University of Belgrade.

Overall, my experience at FBK was characterised by a dynamic and supportive environment, with friendly and accommodating people who helped us complete our tasks efficiently and to our mutual satisfaction. “

Dr Nataša Bogavac Stanojević, Full Professor at the Faculty of Pharmacy – University of Belgrade

“During my one-month stay at Fondazione Bruno Kessler, Trento, Italy, I have learned how to apply basic machine learning techniques for analysing multiomic data. I have learned which techniques to apply for single layer analysis, and which ones are appropriate when we apply multiple layer analysis, for mulitomic approach. This is the main type of statistical analysis which is applied in CardioSCOPE project. I have attended the basic artificial intelligence-machine learning course (AI- ML) course, which covered the main concepts of the use of AI-ML. We have covered basic principles of machine learning paradigm, “training to prediction” and the appropriate use of training and test dana sets. Furthermore, I have learned the basic concepts and the use of supervised, unsupervised and reinforcement learning. We started with unsupervised learning which covers the use of statistical tools to better understand n observations with a set of p features without being guided by a response variable y. Focusing on unsupervised learning, the goal was to identify groups in dana sets, beginning from data with no labels, grouping of the data into clusters (clustering) and dimensionality reduction- reduction of the number of features. For clustering of the data and reduction of the observation space we used a broad set of algorithms for finding subgroups of observations within a data set. Particularly, I have learned – k-means, hierarchical and spectral approach: partitioning around medoids (PAM) and clustering large applications (CLARA). Typical use of unsupervised learning is to find common patient traits, voter profiles and customer segmentation. As for k-means, firstly I have learned how to define a distance to compute similarities between pairs of observations, using Euclidean, Manhattan and Correlation-based (Pearson, Spearman, …) distance measures, then to estimate greedy local optimum with randomly assigned each observation to an initial cluster and or iterate until cluster assignments stop changing. Also, to determine the optimal k value by deterministic resource allocation or descriptive understanding. While for dimension reduction (reduction of the feature space), I have learned: principal component analysis (PCA) & friends (t-SNE, Uniform Manifold Approximation and Projection-UMAP); factor analysis; matrix factorization and autoencoders. PAC is used to emphasize variation/similarity and to highlight strong patterns in a dataset (dimensionality reduction), using general algorithm: first principal component has the largest possible variance (and thus explains the largest variability) and second, each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. I have also applied t-SNE: t-distributed Stochastic Neighbor Embedding, which is a non-linear method that preserves local distances between data points in the lower dimensional space (optimal separation in 2D).

Furthermore, for supervised learning we have covered the main concepts: data with labels-“external teacher” supervises the training; from historical labelled data, finding a rule to predict the label of future dana, and finally learning a function that produces an appropriate output when given new unlabeled data (e.g. regression, classification). I have applied k-Nearest Neighbors (k-NN), predictions on new data are based on stored and labeled instances (instance-based learning), and needs a distance metric (similarity measure) to quantify the distance between the stored data and the new instances which can be Manhattan or Euclidean. This method is used for classification & regression. I have also learned how to do a hyperparameter optimisation, and logistic regression. I have learned how to evaluate the trained model, using accuracy, recall, specificity and F1 score for classification when target variable is categorical, and mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percent error (MAPE) for regression when target variable is numerical.

Finally, after the extensive ML course, I have started to write down the manuscript draft for CardioSCOPE project, analysing sex-differences in atherosclerosis and applying the AI-ML concepts in analysis of omics data related to sex differences.

In addition, I would like to add that this experience was much more pleasant with great people and time from Bruno Kessler Institute, which made me feel at home and made a nice, productive, learning asthenosphere. “

Dr Jelena Munjas, Assistant professor at the Faculty of Pharmacy – University of Belgrade

“During my three-month stay at the Fondazione Bruno Kessler Institute in Trento, Italy, I gained valuable experience applying basic machine learning techniques to analyze multiomic data. I learned which methods are suitable for single-layer analysis versus those appropriate for multi-layer analysis, a key focus of the CardioSCOPE project. By the end of my course, I was able to participate, alongside colleagues from the institute, in analyzing data and providing a list of ACS-specific molecular signatures, including specific genes, miRNAs, proteins, and metabolites related to ACS and MACE. The main skill I developed during these three months was knowledge of data integration and the application of ML algorithms. My stay in Trento began with attending an artificial intelligence and machine learning (AI-ML) course that introduced essential concepts, covering the principles of the machine learning paradigm, the process from training to prediction, and the correct usage of training and test datasets. I learned about the fundamentals of supervised, unsupervised, and reinforcement learning. Unsupervised learning was particularly interesting to me, as it involves using statistical tools to analyze a dataset with nnn observations and ppp features without a response variable. The aim was to identify groupings within the data, utilizing techniques such as clustering and dimensionality reduction. Unsupervised techniques are commonly employed to identify shared characteristics among patients, voters, and customers. To effectively use these techniques, I learned various clustering algorithms, including k-means, hierarchical clustering, partitioning around medoids (PAM), and methods for clustering large applications (CLARA). I also studied how to define distance metrics and correlation-based measures to compute similarities between observations and how to determine the optimal number of clusters, kkk. For dimensionality reduction, I explored techniques such as principal component analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), factor analysis, and autoencoders. PCA focuses on maximizing variance to reveal strong patterns in a dataset, while t-SNE helps maintain local distances in lower-dimensional representations. In the second month, I learned to work with labeled data, where an “external teacher” guides the training process. This includes finding rules to predict labels for new data and applying methods like k-Nearest Neighbors (k-NN), which requires a distance metric for instance-based learning. I also applied these techniques to several datasets provided by my colleagues from the institute. In the final month, model evaluation became another key area of focus. I utilized metrics such as accuracy, recall, specificity, and F1 score for classification, as well as mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percent error (MAPE) for regression tasks. Additionally, in September, my colleagues received a transcriptomic dataset as part of the CardioSCOPE project, and I was able to assist them using the knowledge and techniques I had previously acquired. During the course, I also participated in the preparation of two papers on sex differences in atherosclerosis with my international colleagues. “

Tamara Ratković, PhD student at the Faculty of Pharmacy – University of Belgrade

Latest Updates

As we approach the end of the second year and the midpoint of the project, our partner, the University of Belgrade – Faculty of Pharmacy,...
From 16th-18th September, 2024, we had the incredible opportunity to join forces between MSCA SE CardioSCOPE and COST Action AtheroNET, bringing the latest advances in...
IMG_0461
“I participated in a 3-month secondment in the Genetic Lab (GL) in Bucharest, from April 1st  2024 until the 30th of June 2024. During this...
OK1
“I was hosted in the Clinical Chemistry lab, led by prof. Tsatsanis, at the University of Crete School of Medicine (UOC). During my stay at...
MC1
“During my stay at the Genetic Lab SRL. I received extensive instruction on the general organization of the laboratory and the workflow when analyzing different...
“During my secondment at the Medical School in Crete from 16 June 2024 until 17 July 2024, I had the great opportunity of working under...
image1