A way to explain what Blast is to young students or non-scientists is to say that ‘Blast is the equivalent of Google for searching sequences‘.
This analogy is controversial and not all the bioinformaticians would agree on it: but it is one of my favorite ways to explain what is Blast to people outside science, and it is the explication I use during Open Days or Science Meets Society events.
In this post I will discuss the Pros and the Cons of explaining Blast as the correspondent of Google for scientists. It is up to you to judge and start using the analogy for your own.
– Both are for searching.
The most common usage for Blast is to search for a sequence, to see whether it exists or if a similar sequence is already known. In this sense, a Blast query is equivalent to a Google query, where you type a word or a phrase and you want to know what is already known about it.
Let’s say you have the sequence ACGAGGGCATCGATCGACCTATCTCTTTCTAGGCAATC: what would be the first thing to do to know it’s function and role? Just blast it, and see which results come out. Similarly, let’s say that you encounter a phrase you don’t of which know the meaning, like ‘Asparagine N-Glycosylation’: how can you know what does it mean? The easiest solution is just to google the phrase and see what comes out. I think that it is important, for a student, to understand this analogy: Blast can be used to understand what it is known about a sequence of nucleotides or aminoacid.
– Both are used because they are popular.
What makes of Blast the most used alignment engine, while there are a lot of alternatives available?
The main reason is that Blast is the most popular and well known tool of the genre, so many researchers just use it as the default alternative. Moreover, it is robust and people trust the results it returns, like for Google.
This does not mean that Blast is the best alignment tool for everything: for certain tasks it may be better to try an alternative, as there are alternatives to Google.
– Both of them give non-reproducible results.
Both Blast and Google do not return reproducible results, in the sense that if you try the same search twice, you will get two different sets of results. If you don’t believe that, just try to run the same google query from two different browsers: the results will be different, specially after the second page.
The same applies to Blast; since the alignment is calculated with an heuristic algorithm, two searches on Blast can give slightly different results. It is very important, for a researcher, to keep this in mind, and the google-blast analogy can help you remind it.
– You should annotate the database and the version used, or the date of the search
As I was explaining before, the results from Blast and Google are never completely reproducible, but they tend to be similar, so from a certain point of view they can be considered as being partially reproducible. However, it is also important to remember that results change given the database and the version of the data indexed when you make the search. With google, you can’t change the database used for making the search, but it is known that they update their indexes every few days. With Blast, if you want to describe the results of a search, you have to give the database used and its version in order to make it reproducible. If you make a Blast search on the NR database today, you may not get the same results if you repeat the search after one month, so always take this into consideration when writing a scientific paper.
– Both of them can only return results about known things.
A google query will only return something if there are already some existing documents about what you are looking for. In the same way, blast will return empty results if nothing is known about the sequence you are looking for.
If there are no exact results for the alignment of a sequence, it is possible to make guesses and predictions based on which sequences are enough similar. With google, you can do something similar: if you don’t know the meaning of the phrase you are looking for, and you don’t find anything on Internet, you may guess it by looking at similar results; but in both cases, you won’t never know if the similarity is sufficient to infer the meaning of what you are looking for.
– Both are based on very complex statistics and algorithms
One objection that I got when explaining this analogy on nodalpoint some years ago was that Blast is based on very complex statistics. Well, the same applies to Google: only that we do not know the details of these statistics.
Moreover, both are based on very smart algorithms, even if from different families. Google uses graphs, and Blast uses a table and an heuristic algorithm. In any case, it is not really necessary to know the inner details of the algorithms behind them in order to use them, but just be aware that none of them returns the best result but only a set of results which are close to the optimal solution.
– Google and BLAST are both used as verbs, because they are such widely used search programs (by Jessica Newman).
– Google is not the Internet, and BLAST is not a database (by Jessica Newman).
– You can set limits for both kinds of searches, i.e. Google to “site:edu,” and BLAST to “organism=Rat” (by Jessica Newman).
– Google does not return a p-value
Blast returns an e-value and a p-value for each result, to quantificate how much the alignment between the query sequence and the target is good. Google does not do anything like that: it just gives you a list of results. However, it is possible that somewhere inside the google algorithm, each result has a score or a p-value, which is used for ranking results but not shown.
– There are a lot of things you can do with Blast apart from searching for a sequence
This is a common objection that has been made to me… but really, can you make an example of using Blast which can not be also described as a generic search?
– Google is not open source and free, while Blast is.
The NCBI version of blast is free and can be downloaded for free, and used in a local computer. The same is not true for Google: you can’t create your own personal instance of google and run it on your laptop.
– BLAST searches very specific, curated, selectable online databases, while Google searches the Internet. You can (and should) control what BLAST searches, but you can not control what Google searches. I don’t think many new users understand that, and I think it is important (by Jessica Newman).
– Google comes in one-size-fits-all, but there are several BLAST algorithm options which accomplish different things. I think this is another point of confusion for new users (by Jessica Newman).
– Whereas NCBI’s BLAST interface is super confusing, and locally installing BLAST, setting up databases, performing searches and understanding what the different result elements mean is quite challenging! (by Yannick Wurm).
Anything else? I will think about this and see if I can improve this list. Feedback is very welcome 🙂