Probably most people these days start their journey into unfolding something unknown by opening up a search box of a search engine in their browsers, entering some queries and then surfing trough the available results (well, the first 1-2 pages of the returned hundred of thousands). Trouble is that the chances are that some malicious techniques affected how the information
retrieval model retrieved the documents for your query.
Search engines usually determine whether a document is relevant to the given search query or not by two metrics: relevance and importance. Relevance on the one hand refers to the textual
similarity of the page with the query. Relevance is expressed in a numerical value. Higher numerical value means more relevance. Importance on the other hand refers to the global popularity of the page. Importance of a page is independent of the query, and is measured by inbound links to that page. Pages with many incoming links are considered more important.
Of course everyone wants to be the top dog – the site where everyone goes for all their time-wasting purposes. The art of betterment of your site’s standing within the results is called Search Engine Optimization (SEO). Of these techniques, some are legit and then some are firmly in the gray or even darker zones of the interconnected world.
Most search engines use a version or a variation of the term frequency–inverse document frequency (TF–IDF) metric. The acronym stand for the two key metrics involved:
TF: The frequency of the term tin document d.
IDF: The number of documents in the sum of results in which term tappeared. For example, if the term t appears in 10 documents and there are a total of 100 documents, then its IDF will be log (10/100) = 2.
With careful preparation of meta tags, spam words and a host of other other techniques you could spoof the search engine AIs to manipulate the results of the queries.
What is more interesting – given CAIC is not at all about White Hat & Black Hat SEO – that these same techniques could be used as analogues for spoofing other AIs big time.
Saad Farooq: A Survey on Adversarial Information Retrieval on the Web
Be the first to comment on "Countering search engine AIs: how you never see the result you were looking for"