Book Lovers’ day 2022

Yesterday it was Book Lovers’ day! Thanks to Brady Todd for reminding me.

Here is a list of books I have read recently and recommend.

The Maverick by Ricardo Semler: this is a great classic on Leadership and on how to organize a company. It tells the story of Semco, a Brazilian company where all employees self-organize themselves, everybody is free to set their salary and working hours, and there is mostly no hierarchy. I am planning to write a longer post about this book, because I believe many of these practices were used at GSK when I joined there, and the company was organized into small units resembling self-organizing startups. It’s amazing that this book has been written more than 30 years ago!

The Phoenix Project by Gene Kit: A classic Agile book. It tells the story of a company that slowly evolves from a legacy Waterfall strategy to Agile. It is written from the perspective of a developer, and towards the end, I was almost crying as I was reading the pages. It was so painful to read about all the crashes and big security holes when they released the software for the first time, and so fun to see how they improved the process! Will they be able to do ten deployments per day, or crash in the process? This is definitely a nerdish read!

Game Wizards by Jon Peterson: The story behind the creation of Dungeons & Dragons, the first role-playing game, almost 50 years ago. The book tells the horror story of the company founded by Gary Gygax, how it was mismanaged, and what terrible working environment it was. I’m glad D&D survived to these days, but it is sad to read how it was developed – at least, there are many things to learn from this story.

Bad Blood by John Carreyrou: Another horror story about a start-up. This is the story of Theranos, the company founded by Elizabeth Holmes. Apart from being an amazing piece of journalism, this book is an example of everything you should not do when managing a company, and a collection of all the worst leadership practices there it could be in a company. I got very addicted to this book, and started watching videos on youtube to read more about this story.

The Power of Ethics by Susan Liautaud: A collection of stories and points of reflection about ethics in current times, going from scandal of human gene editing by a Chinese scientist to the Boeing 737 Max 8 Jet scandal, and much more. The book proposes a framework to understand the impact of ethical choices and the way information and consequences spread in the community. Susan joined the company BenevolentAI I worked for as a member of the board, which is great news for us.

Between Ape and Human by Forth Gregory: This is a book by an anthropologist that collected evidences that Homo Floresiensis may actually still be alive, according to tales and legends from people living in the Flores Island. The evidences unfortunately are not very strong, but there it may be some truth in it. This book made me want to leave everything and depart for Flores island 🙂

Human Kind by Rutger Bregman: I am halfway through this book, but I have already been enjoying it. Essentially it promotes the concept that humans are intrinsically good people. Current society and culture make us think that we are more egoistic and aggressive, but in reality, if you look at case by case, it is not true. Recommended read!

a Bioinformatician in the Big Pharma

The last 18 months have been quite a radical career change for me. This is because I made the infamous move: leaving the Academia and starting working in the Industry.

My career from Bologna to Barcelona, London, and the British Countryside. Yes, England is really more rainy than Barcelona.
My career from Bologna to Barcelona, and from London to the British Countryside.

To be honest I am quite happy of the change. I’ve learned many things, discovered another way to do science, and possibly made some contributions. Moving from the Academia to Industry sometimes has a bad reputation, but these months taught me that to develop a drug, there are many resources to be involved: not only a smart idea in the lab, but also lot of validation, regulation, planning, marketing, budgeting, understanding the impact on the patients, and much more.

Where am I working exactly?

I am in the pre-clinical department of a big pharma company, GSK. More specifically my department is called Target Sciences, and the main scope is to identify and validate new targets (in layman terms: genes or biological entities) to treat indications (in layman terms: diseases or phenotypes).

The R&D department of GSK is structured in several Discovery Performance Units (DPUs), which are small independent units working on a specific therapy area. For example there it could be a DPU focused on Oncology, or another on Asthma and respiratory diseases. These DPUs are like small start-ups within the company, and they each carry out a few drug target through the drug discovery process.

Drug discovery process – I am in the first phase. Source: https://www.slidegeeks.com/shapes/product/business-steps-powerpoint-templates-marketing-drug-discovery-process-ppt-slides

My department helps all these DPUs identifying and evaluating drug targets, providing several computational biology expertise, together with genetics, stats, and experimental validation. It’s like a center of excellence which interacts with all the rest of R&D.

Identifying the correct target is important because it is the first decision in the drug development process, and an error in this step can be quite expensive. Imagine what happens when a clinical trial fails in phase III because the original drug was targeting the wrong gene: it is quite a big waste of resources, not only for the company but also and more importantly for the patients.

What is target identification, and what is my role

In layman terms, identifying a drug target involves answering the following question: if I want to treat disease X, which would be the best genes to target?

From a computational point of view, there are several ways to answer such question. You may simply go to the literature (e.g. pubmed) and search for relevant articles. Other approaches involve looking at information from several sources, like gene expression, protein interactions, involvement in pathways, and much more. It is usually a matter of data integration, or data science.

If you want to get a more general idea of the types of sources used for target identification, you can have a look at the Open Targets Platform; this is a pre-competive effort to curate and integrate data sources, supported by the EBI, GSK and other pharmas.

My role, in particular, is more focused on data integration and management than pure analysis. It is about making the best use of the datasets we have access to, and understanding what is the value of acquiring a new dataset. It is also about improving communication about data usage, and discovering new technologies and methods to make use of the data.

What is good about working in a pharma, compared to academia?

Let’s say three things:

  • Team Working. This is the answer that hurts the most, specially me.
    If you look at the previous posts in this blog, you can see how much I care about doing science in a agile way, planning properly and sharing information. The problem is that in the academia, the pressure of having to publish first author papers ruins it all.
    In the academic world there is a lot of collaboration, specially online, and team meetings and journal clubs; but at the end of the day, your long term prospects are all dependent on your own reputation in the scientific world. This is fair enough, but difficult to reconcile with real team working.
  • Lots to learn: everybody is usually involved in more diverse projects, and interact with more people from different background. Thus, you tend to specialize less in a specific area, and learn a bit of everything. To be honest, I prefer this approach as it keeps the attention higher. I am glad that I did a PhD, during which I spent several years specializing on a single area, human genetics; however, now that I got older I like learning more about different fields.
  • Possibility to grow. You are generally more pampered and cared than in the Academia. You are actively encouraged to follow courses and learn new technologies; and my line manager complains if I am still in the office after 6 pm. (to be honest my PhD supervisor also did). There are opportunities to do secondmends in other parts of the company, and learn about clinical trials, finance, or anything related. Every year you define a list of objectives with your line manager, and you are valued depending on how you reach them, in a fair process, and you are valued for your efforts and accomplishments.

What is Bad?

  • Politics. Unfortunately politics is everywhere, specially in a big international company. Luckily I am still unimportant enough, that this doesn’t affect me much.
  • Simplification. Interacting with people with different background means that you need to simplify and learn to explain complex biological concepts in a way that is easy to understand. This is not easy and sometimes lead to funny effects, e.g. when you start hearing buzz-words and simplifications. On the bright side, at least I am improving my communication skills.

What’s next?

For personal reasons I haven’t written much in this blog lately, and I may not be able to write much in the near future. However, hopefully I’ll be able to write more about this new adventure, and describe how science is done from the industry side.

Hacking Global Health London 2016

A few months ago I’ve participated in a Hackaton organized by Open Data Science on data from the Healthy Growth, Birth, Development knowledge integration (HBGDki) initiative by the Bill and Melinda Gates foundation.


The aim of this initiative is to collect data on child growth and development from several sources, to study which factors influence child growth and how to better intervene when there are risks. Currently the data comes from manual annotation of several publications, but future plans include launching a global effort to collect data systematically, and actually one of the objectives of the hackaton was to guide the planning of this effort.

I had a lot of fun during the hackaton and learned a lot. For me personally was an opportunity to learn more about the caret R package, which is a must-known library for doing machine learning in R. My plan for the hackaton was actually to do a trajectory clustering to see if there were different trajectories of growth of the baby during pregnancy, but unfortunately the analysis didn’t return very interesting results 🙂

See my github repo for some jupyter notebooks, and the slides on slideshare for more info.

Published a “Post Publication Review” on Publons

A while ago I posted in this blog an analysis on fitness genes, illustrating an use of the Bioconductor data packages and based on a recently published paper (Are fitness genes more conserved across species?).

This week I have been contacted by the team of Publons and asked to paste the same analysis on their platform as a “Post Publication Review”. Of course I’ve accepted: Post Publication Review of High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities

Publons is a social network for peer reviewer, where you can list of papers you reviewed, get credit for it, and even post new reviews on published papers. I personally like the idea of Publons very much, because I think that reviewing papers is an important part of science, which unfortunately doesn’t get the recognition it deserves.

Hiding cows in the genome (a.k.a. an introduction to bash programming)

Preparing the materials for a workshop on bash programming is very difficult, because you never know which level of skill to expect from the people attending it.

belgradebash-cow-screenshot
Click on the image to access the slideshow.

Most of the times the class will be a mix of absolute beginners and expert Unix users, and it is not easy to prepare a presentation that will interest both. If the materials are too advanced, the beginners will get frustrated and stop paying attention. If the materials are too simple, expert users will get bored soon and get distracted, and start working on their own things and checking facebook.

In an attempt to avoid these issues, I’ve decided to go for a trick that hopefully would get the attention of even the most advanced bash guru, which is: hiding cows in the genome.

More precisely, for a workshop at the Programming for Evolutionary Biology conference held this year in Belgrade, I designed the exercises in a way that the instructions for the next step can be retrieved using the correct bash commands. Students start with a file of randomly generated text, and they have to use grep and other unix tools to proceed to the next exercise. If the exercise is done correctly, they also see a cow.

I think it worked decently, because the students liked the idea and finding cows in the fasta and bed files was fun.

The workshop’s materials are below. (if the iframe doesn’t work, click here). If you are a teacher and organize workshops on bash programming, here I am officially challenging you to include something similar in your next presentation 🙂

[iframe src=”https://nbviewer.jupyter.org/format/slides/github/dalloliogm/belgrade_unix_intro/blob/master/PEB%20Bash%20Workshop.ipynb#” width=”100%” same_height_as=”window” scrolling=”yes”]

Data Annotation Packages in BioConductor

Bioconductor does not only contain analysis packages, but also a good suite of data packages, frozen from the most important data sources for bioinformatics (e.g. EBI, NCBI, UCSC, etc..).

These data packages are useful because because they allow to access certain biological relevant data quickly and without having to manually download them from the web. They are used internally by several analysis packages (e.g. to calculate ontology enrichment, get gene coordinates, etc..), and in a way they improve the reproducibility of your analysis, because by updating them within R you will access to the same version of the data frozen as for anybody else using them.

This slideshow provides a quick summary of all the data annotation packages available, how to use them and how this part of bioconductor is evolving.

Click on the screenshot to access the slideshow.
Click on the screenshot or here to access the slideshow.

I’ve prepared the slideshow for the second workshop at the Programming for Evolutionary Biology in Belgrade I’ve presented this year. It is probably less glamorous than the Bash slideshow, as there are no hidden cows, however it may be more useful, specially if you use Bioconductor regularly.

Disclaimer: I am not a bioconductor developer, but just an user. So apologies if I wrote anything wrong 🙂

Interviewed by LabWorm for NCG

Our group has been interviewed by LabWorm regarding our recent publication on Network of Cancer Genes 5.0.

I absolutely love the “artist impression” they made of our team:

The NCG team sketched by LabWorm. Thanos Mourikis, me, and Omer An.

LabWorm is a collaborative platform for sharing tools and links related to bioinformatics. They have a very modern and interactive user interface, and they are very active in adding new links and involving people in the platform.

Over my too many years of experience in the bioinformatics field, I saw many attempts at creating collections of bioinformatics tools. Unfortunately many of these failed because of lack of interest or lack of maintenance. However LabWorm seems to be doing things right for the moment, as they really work hard to engage people in their community, and they even publish some blog interviews to researchers.

The bioinformatics community really need a effective way to share tools and links, and I really hope that LabWorm will be successful in their attempt.