Good Information Articles |
Stop Parking Domain Names Develop Your Domain Names |
|||||||
The Problem Of Demoting Spam On The Internet: Yahoo!?s Trustrank Approach
TrustRank is an attempt to counter the web spamming activities that threatens to deceive search engines? ranking algorithms. It propagates trust among web pages in the same manner that PageRank propagates authority. However, tests would show that the combination of trust and distrust values have greater ability to demote spam sites than with the use of trust values alone. The Assumption A link between two pages holds an implied conveyance of trust emanating from the source page to the target page. Pointing to a link is a vote of confidence from the source that the target is able to provide content that will be of value to the user. It basically revolves around the ideal set-up that good sites only point to similarly good sites and will not knowingly refer people to spam sites. These good sites hold the trust of people which is then used in propagating trust through the link structure of the web. TrustRank hopes to use a set of highly trusted seed sites to help in demoting web spam. The approach assigns a non-zero initial trust score to these seed sites while assigning initial values of zero to all other sites. A biased PageRank algorithm is used to propagate these initial trust scores to the outgoing sites where good sites are expected to get a decent trust score while spam sites are likely to get lower trust scores after convergence. The possibility of a page pointing to a spam page increases as the number of links increases. It has been proposed that the trust score of a parent page be equally split among the children pages. There is the question as to the logic of having different trust scores for children pages in cases of multiple parent pages. TrustRank provides a solution by simple summation which has been not quite effective in curtailing the spam site?s efforts to raise their ranking. The conveyance of distrust emerged as a natural extension of the conveyance of trust between links. Distrust may be an indication of lack of confidence to a source page due to its linkage to an untrustworthy page. Thus, when a link with a known spam page is established, the trust judgment of the source page cannot be considered valid. TrustRank as it was originally conceived, proposed that trust should be reduced as we move further away from the seed set of trusted pages. However, the limited number of seed pages makes it impossible for the whole web to be touched by propagation. A well performing algorithm is needed to produce trust judgments at least for a larger fraction of web pages. The seed sets used may not be able to sufficiently represent the different topics of the web. TrustRank tend to show a bias towards larger communities which can be remedied by the use of topical information to divide the seed set and calculate trust scores separately for each topic. The use of the pages listed in well-maintained topic directories can help in resolving the coverage issue. Seed filtering may be done to remove low quality pages or even spam pages that may inadvertently been included in the pool of seed pages. Much work is being done to come up with methods that don?t rely heavily on human judgment for identification of spam free pages. As it is, searchers are highly challenged to locate pages that would serve their needs and not those that are intended for high ranking in search engines. Sites that do not provide any value to users are just too many to be ignored. Semantic Cloaking on the Web Semantics is the study or science of meaning in language that takes words and compares them with other words or symbols and determines the relevancy and relationship between them. Semantic cloaking is the practice of supplying different versions of a web page to search engines and to browsers. The purpose of the content provider is to hide the real content of the page from the view of search engines. The difference in meaning between the pages is supposed to deceive search engines? ranking algorithms. Cloaking is one type of search engine spamming technique that makes it possible for non-relevant pages to occupy top ranking in searches. Search engines are used by people when they need to find the most relevant responses to their search. It is typical for users to view just one page of results thus sites are hard put to compete for the top rankings particularly for popular queries. Increased traffic to a commercial website is equivalent to more profit. Reputable content providers work hard to come up with high quality web pages to get their desired high ranking. Unfortunately, not all content providers hold the same view. These are the people that would try to reach high ranking through manipulation of web page features used by search engines as basis for their ranking algorithms. Ranking algorithms assumes that page content is real. This means that the content seen by search engines is identical to that seen by actual users with browsers. With the use of the web spamming technique of cloaking, different versions are successfully supplied causing a big amount of confusion and disappointment for users. Cloaking falls under the page-hiding spam category in search engine spamming techniques. Some cloaking behavior is considered acceptable. Cloaking is of two types ? syntactic and semantic. Syntactic cloaking includes all situations in which different content is sent to a crawler and real user. Semantic cloaking is an offshoot of syntactic cloaking which employs differences in meaning between pages to deceive the ranking algorithms of search engines. Syntactic cloaking may be acceptable in cases such as web servers using session identifiers within URLs for copies sent to browser and no such identifiers for copies sent to crawlers. This is in effect being used by web servers to differentiate their users. Search engines may interpret these identifiers as a change in the page. The cloaking behavior that needs to be penalized is the semantic cloaking. There are various proposals on ways to counter the problem. One proposal suggests the comparison of copies from both the browser?s perspective and the crawler?s perspective. It may be necessary to get two or more copies from each side to be able to detect cloaking. Another suggests a two-step process that would require fewer resources. The first step implements a filter by use of heuristics to eliminate web pages that cannot demonstrate cloaking. All the pages that have not been eliminated will go through the second step for inspection. Features are extracted from about four copies and a classifier is used to determine whether semantic cloaking is being done or not. However, the reality remains that no ideal solution has been arrived at to effectively curb semantic cloaking. This is a technique that should not be practiced by anyone who wants to maintain good business ethics. The practice continues to undermine the search engine?s attempts to provide users with the actual information they need.
Other Article Sites findabook.com moneycd.info a-mortgage.info
about-lemon-laws.info aboutstudentloans.info |
MORE ARTICLES: The Best Internet Connection For A Top Home Internet Business When it comes to working at your home business one of the most important things you will need is internet access Having a fast and relatively error free Internet connection is imperative for a person to be successful in working at home
Internet Marketing Business - Your Home Based Internet Marketing Business and the Freedom Attached with it
Children's Home Society & Family Services Receives Reaccreditation to Once Again Facilitate Adoptions in Russia
Internet Home Based Business : Legitimate Work at Home Jobs Opportunities and Advantages
Work At Home Internet Home Business Opportunity
Online Magazine is Home to the Best Travel Writing on the Internet
Making Time For Your Family and a New Internet Home Business
Family Circus - Make Time For Family With An Internet Business
Families Mourning Children Travel to Nashville for July Compassionate Friends National Conference
Internet Marketing Plan For Internet Home Businesses
Family Travel at its Finest - European Villa Rentals Perfect for the Whole Family
New Home Based Travel Businesses and Training Available for Retirees and Work-At-Home Parents
Internet Based Business: How Single Parents Could Earn Extra Income For The Entire Family
Business & Family Safety and Health Rating
Home Based Internet Marketing Business - Staying Organized In Your Home Office
|
|||||||
| Develop Your Domain Names | Site Map | Home | ||||||||