The PeARS search engine

Find out about PeARS at http://pearsearch.org/.

Imagine a new kind of Internet search. When you have a query, instead of going to a big, centralised search engine, well, you just ask someone! “Kim, where can I find some cool fabric for my new dress?”, “Karim, how do I get from Orly airport to the Gare du Nord in Paris?”, “Jane, what is a cruck barn?” Of course, Kim, Karim and Jane are not waiting in front of their computer, ready to answer your queries. But in virtue of being a fashion buff, a keen traveller and an architecture enthusiast, they might just know which website could in principle help you. Leveraging this human knowledge in a completely automated fashion is the idea behind PeARS.

PeARS (Peer-to-peer Agent for Reciprocated Search) is a lightweight, distributed search engine. It relies on people going about their normal business and browsing the web. While they do so, the pages they visit are indexed in the background, and assigned a ‘meaning’ (is this page about cats, fashion, ancient history, Python programming?, etc). From time to time, they can choose to share some or all of these meanings with others, providing the building stones of a giant search engine network, distributed across people.

By linking page meanings with real people doing real browsing, PeARS ensures that the nodes in the network are topically coherent. An individual interested in architecture will probably have indexed a lot of webpages on art, construction and engineering topics. A dog trainer may have spent time buying equipment from online companies she trusts. By sharing the relevant part of their history, they make other people on the PeARS network able to use their specialised knowledge.

Think of PeARS as a layer of virtual agents underlying a community of real people. Your virtual agent is responsible for sharing your Web knowledge in the way you choose, and for contacting other people’s agents to help you answer your queries. This behaviour is very similar to the way people behave offline, both in terms of advertising particular specialisations and of looking for relevant sources when seeking information.

Why PeARS?

Recently, data has been — rather nastily — dubbed ‘the new oil’. Experts predict that the wide availability of huge amounts of information is about to change the way we’ve been living since the industrial revolution. It is however unclear how well the ‘data revolution’ is going to pan out. The new state of affairs requires a revised roadmap for everything, from technology to legal frameworks. It is down to us, the people, to make this revolution happen the way we want it (and hopefully, have it as little oil-like as possible).

Freedom of search: While some communities are doing great work in favour of the open licensing of data, one important issue is often left aside: is the process by which we access virtual content open? If tomorrow, the entire Web entered the public domain, would I truly have free access to the world’s knowledge? I believe not. Having billions of free documents at our disposal means nothing if we don’t own the means to search them. On this point, the status quo is far from encouraging. Web search is owned by a handful of commercial companies. Finding the content we need is mediated by algorithms we have no control over. This has two consequences. First, we cannot partake in the design of search rankings, turn off ads, be sure that our queries are not collected by a third-party. Second, we are kept away from fully engaging with the Web. The Internet is an extraordinary datastore which promises to be a learning ground for all sorts of new technologies. Using this data to its full extent, in a fully democratic fashion, is only possible with direct, uncompromised access to it.

Privacy: A lot of people are concerned about the privacy of their personal data – rightly so. Most of what we do on the Internet can be captured, stored and analysed. Our web searches, in particular, tell a lot about us. Searching for flights to Beijing is an indicator we might be going to China. Searching for remedies against migraines reveals we might suffer from headaches. And so on and so on. Typically, such queries are gathered by one centralised company which, by providing us with a search service, collects in return a wealth of personal information. A possible solution to get away from this undesirable situation is to move towards local search solutions, where most of the work is done on our personal machines.

How it works

In PeARS, search is distributed over a network of peers (or ‘pears’, as we prefer to call them). Each pear is a set of files stored somewhere on the Internet, which represent the web pages that a real person has visited (and is willing to share with the world). Such a file might look like this:

0:190.826601 39.598095 39.237190 111.530819 86.312570 78.477238 59.353659 70.806475 49.927581 […]

1:75.154387 19.989724 35.188764 33.998277 65.646423 18.362618 32.714728 57.627935 25.953652 […]

where 0 is a representation of the Wikipedia article about cats, and 1 a representation of the Wikipedia article about dogs.

When someone queries the Internet through PeARS (we call it ‘pear-picking’, or in short, ‘picking’), the engine tries to find the pears most likely to know about that query – that is, the files ‘donated’ by people interested in relevant topics. This is how we get access to people’s best Internet tips — and cut the search space at the same time!

PeARS is powered by distributional semantics. Check out my light introduction to the technique.

Advantages

  • Privacy: search takes place on your local computer, so you avoid the situation where one large company accumulates information about your searches.
  • Digital self-sufficiency: it is well-known that pears from your own garden taste better. They also ensure that you are in control of your food supply. PeARS does the same for web search.
  • No crawl necessary: the shared pages are those that people visit anyway.
  • No messing about with links: because pages come from a real person’s browsing history, the whole job of selecting ‘the right links’ and coming up with ‘authorities’ is done manually.
  • Social aspect: it is possible to make one’s pear(s) more or less anonymous. The social network buffs out there will like building their own search communities.