How to use Wikipedia to boost your discovery skills

Discovering good tech and business information on the web, such as new R&D, companies, or markets, can be quite challenging. It requires discovery skills. Essentially, the problem usually is, How can you come up with good search queries that deliver useful results? Or, put a little differently, what terms might other people use to describe something that is interesting to you? And this can vary quite a bit. For describing the same concept, somebody writing a scientific paper probably uses different terms than somebody who writes about the latest venture-backed company, for example. This article is about how Wikipedia helps you with all this.

Before diving into details, here is the post-breakdown:

  1. Highly specialized technical people in particular tend to use very specific and technical search queries
  2. “Associating” is a discovery skill cultivated by great innovators
    1. Expanding
    2. Zooming in (sometimes)
  3. How Wikipedia helps you both zoom in and expand
    1. Using Wikipedia to zoom in
    2. Using Wikipedia to expand
    3. Applications
    4. People
    5. Companies
    6. Other Wikipedia contents for expanding a query
  4. Can’t a computer do all this?

Now, let’s dive in.

Besides building our software, we also work with many of our customers directly, and support them in their discovery endeavors. These discovery endeavors come in many different flavors, with different goals and scopes. Our customers have various professional backgrounds, operate across various industries, and in various roles within their organizations. In other words, we talk to many different people who explore many different things at the intersection of technology and business.

Yet, despite all these differences, there is one particular issue that sticks out:

Highly specialized technical people in particular tend to use very specific and technical search queries

Of course, this is to be expected. If you have a lot of expertise in an area, using your expertise is like riding a bicycle. You don’t think about it, you just do it.

But the problem is that too narrow or too technical searches deliver very high precision but very low recall results. And “high precision but low recall” is usually counterproductive when you want to discover and explore. This is because while you might hit upon a few science publications and perhaps also some patents, you usually miss out on venture investments or other commercial activities. When people write about such activities, they tend to use less specific or technical language. So the danger is that there might be highly relevant commercial activity that you don’t see because your search query is too narrow or too technical.

There is even another problem. Very specific or technical queries usually search for methods. But what if someone “out there” uses a completely different method that is better than the method you search for? Or if they do something relevant but do not mention the method or technology explicitly? For example, you might search for “deep reinforcement learning”, which is a variant of machine learning. But what if somebody else out there solves the same problem that you’re trying to solve, but using something different than deep learning instead? In order to find this “somebody else with the new method”, searching for problems or applications might be a better approach.

How can we do this in a way that is systematic but also flexible and iterative at the same time? How can we boost our discovery skills so that we improve our chances of discovering things that are relevant but that we did not even know might exist? Applied to the “deep reinforcement learning” example, how can we discover things that are not called “deep reinforcement learning” but that are still relevant?

“Associating” is a discovery skill cultivated by great innovators

When you google, bing, or yandex “discovery skill”, one of the top-ranked results is a 2011 article, “Five discovery skills that distinguish great innovators”. One of these five skills they call associating:

Associating (…) helps innovators discover new directions by making connections across seemingly unrelated questions, problems, or ideas. Innovative breakthroughs often happen at the intersection of diverse disciplines and fields.

from Five discovery skills that distinguish great innovators. Article by Jeff Dyer, Hal Gergersen, and Clayton M. Christensen.

Sure, but how can you implement this? And is this something that can be learned?

We think yes, this can be learned, at least to an extent. In another article, “What makes a good innovation analyst?”, we discuss this as well. There we also provide some links to additional resources that help you build your innovation analyst skill set.

Let’s break down “associating” into two things: (A) expanding our search, and, depending on the number of results we got initially, (B) zooming in:

(A) Expanding

Expanding means that we also want to find ideas, companies, people, or markets that are relevant to our interest but that do not refer to our interests explicitly. For example, there could be a company that works on an application of deep reinforcement learning (e.g. computer vision). This company is probably relevant to us, even if they do not say what methods they use exactly. Or if a known expert on deep reinforcement learning publishes a new paper about something else, this paper might still be interesting to us.

(B) Zooming in (sometimes)

If an initial search is so specific that it does not return a large number of hits, there is no need to zoom in, of course. But sometimes we need to zoom in, in addition to expanding. For example, I just searched Mergeflow’s scientific publications data set for the exact phrase “deep reinforcement learning”. I got more than 2,300 unique hits over the last five years alone (hits that explicitly mention “deep reinforcement learning”; if a paper says “deep learning” but not “deep reinforcement learning”, it is not included in my results set). And as the chart from Mergeflow below shows, numbers have been rising.

Monthly numbers of new scientific publications on "deep reinforcement learning". Data from Mergeflow.
Monthly numbers of new scientific publications on “deep reinforcement learning”. Data from Mergeflow.

Obviously, I cannot read 2,300 papers. If I had to select which 5 or 10 papers to choose, how should I make this selection? The most-referenced ones, sure, this could be an option. But then we will hardly be at the cutting edge of things. After all, it always takes a while until a paper is referenced a lot. And we would really much rather like to cut in line instead.

Now let’s see how Wikipedia helps us with both expanding and zooming in.

How Wikipedia helps you both zoom in and expand

In our experience, Wikipedia is a fantastic resource for both zooming in and for expanding. There is almost no topic that Wikipedia does not cover, and it is updated all the time. This makes it a more or less universal discovery skills booster.

Let’s make this more concrete. Here I describe how Wikipedia can boost our discovery skills, we could use Wikipedia to boost our “deep reinforcement learning” discovery journey.

First, we search Wikipedia for “deep reinforcement learning”, and check the results page (click to enlarge):

Wikipedia page on “deep reinforcement learning”.

There already, we get some input that can help us both zoom in and expand. Let’s start with the zooming in part.

Using Wikipedia to zoom in

Like I mentioned above, there are thousands of scientific publications on “deep reinforcement learning”. We can now use the applications mentioned in Wikipedia (robotics, video games, NLP, etc.) to zoom in. For example, “deep reinforcement learning” AND “natural language processing”. This helps us not only make the amount of information more manageable. It also helps us organize the big pile of publications, and thus discover results that are more specific and concrete. For example, if we zoom in on NLP, we get results like this:

Image captioning based on reinforcement learning

Such concrete findings make it a lot easier to form an opinion. And they usually trigger a more creative thought process.

Using Wikipedia to expand

Now let’s expand. There are a number of good ways to do this. How exactly you do this depends on your topic and on your goals. Here, I decided to expand to one of the categories, “deep learning”:

Wikipedia categories from the "deep reinforcement learning" page.
Wikipedia categories from the “deep reinforcement learning” page.

So now we go to the “deep learning” Wikipedia page. This page is rather long, with lots of contents. A great starting point is the content box in the upper right (click on the image to enlarge):

Wikipedia page on "deep learning", upper part.
Wikipedia page on “deep learning”, upper part.

This box has some great content for expanding our query. For example, we could search for “supervised learning” (deep learning is a form of supervised learning). Or for applications such as “anomaly detection”. Such searches can help us discover contents that are relevant but that do not make explicit reference to “deep reinforcement learning”.

Mergeflow offers a range of tools that support your discovery endeavors, such as:

Discover Ideas & Players helps you discover and track people, organizations, concepts, materials, and other entities.
Markets & Investments extracts the latest market data, venture investments, and research fundings from news and other sources.
Emerging Technologies enables you to track disruptive technologies from across various industries, live and continuously.

Applications

Most Wikipedia pages have a table of contents. Here is the one for the “deep learning” page:

Table of contents for the "deep learning" Wikipedia page.
Table of contents for the “deep learning” Wikipedia page.

Most Wikipedia pages have an explicit “applications” section, and these provide great contents for expanding a query. Which of these contents work for you depends on your context, of course. But if in doubt, I would recommend being inclusive rather than exclusive. For example, if your home turf is “natural language processing”, you should probably also explore “drug discovery”. Both fields share some of the underlying technologies, such as “sequence analysis”, for example. For natural language processing the sequences are words or characters; for drug dicovery, sequences could be amino acids. It could also be a good idea to try combinations, e.g. “deep learning” AND “drug discovery”.

Also, if you want to dig deeper on something, you should explore relevant sections in the Wikipedia article. For example, if we go to the “drug discovery and toxicology” section…

“Drug discovery and toxicology” section of the Wikipedia “deep learning” page.

…there is some additional query inspiration. For example, you could zoom in on “biomolecular targets” AND “deep learning”. Or you could search for “AtomNet”.

People

Above, I mentioned that other work by known experts can be very interesting for query expansion as well. Even if these publications or articles are not explicitly about your topic, they might still provide relevant insight in your context.

Now, many Wikipedia pages do not explicitly have a “people list”. But in many cases, Wikipedia pages have a “history” section, and this is very often a good place to find people. For example, here is a part of the history section of the “deep learning” page:

Part of the "history" section of the "deep learning" Wikipedia page.
Part of the “history” section of the “deep learning” Wikipedia page.

So you could search for “Rina Dechter” or for “Yann LeCun”, for example.

Companies

Some Wikipedia pages also have a “commercial activity” section. This is a great place for finding companies to which you could expand your discovery endeavors:

"Commercial activity" section of the "deep learning" Wikipedia page.
“Commercial activity” section of the “deep learning” Wikipedia page.

Depending on the company, different company queries make sense. For example, if a company is likely to be specialized on your topic, it makes sense to just search for that company. Here this could be DeepMind Technologies, for example. By contrast, “broad” companies, Facebook for example, should probably better be combined in a query (“Facebook” AND “deep learning”).

Other Wikipedia contents for expanding a query

Wikipedia pages sometimes have other sections as well that can inspire query expansions. For example, the deep learning page also has a “criticism and comment” section. Here is a part of it:

"Criticism and comment" section of the "deep learning" Wikipedia page.
“Criticism and comment” section of the “deep learning” Wikipedia page.

Contents from these sections do not always provide suitable “direct” query expansions. But in this case, simply searching for “black box” (or “black box” AND “deep learning”) may produce interesting results. When I did this search just now in Mergeflow, I got blog posts such as…

Inside the ‘black box’ of a neural network

…or scientific publications such as these:

Cyclic Boosting — an explainable supervised machine learning algorithm

Improving transparency of deep neural inference process

Can’t a computer do all this?

Well, perhaps parts of it. Sure, Wikipedia provides highly structured data. This enables computers, for example, to “know” that “deep learning” is a form of “artificial intelligence” and an “emerging technology”. You can see this in the “Categories” section at the bottom of the page:

"Categories" section of the "deep learning" Wikipedia page.
“Categories” section of the “deep learning” Wikipedia page.

In fact, using Wikipedia to automatically construct knowledge graphs is a whole research field. Here, for example:

WISER: A semantic approach for expert finding in academia based on entity linking

But I would argue that there is something to be said for human discovery skills. For example, when I see a company name in some context, I can tell very quickly if this company might be interesting. And at least so far, I’d say that algorithms cannot really do this.


Featured photo by Colin Moldenhauer on Unsplash.

Leave a Reply