06/ 9/2009

Docs Are Old-School, We Need PageRank for People

070611_yodas_pagerank.gif
I'm going to let you in on the Search industry's dirty secret...

Google is slipping.
Google's big innovation was in realizing that a link to content is the same as a vote. By tracking all the links pointing to a page of content Google assesses how influential that page is - its reputation. Google calls this 'PageRank' and it's old tech.

PageRank assigns a reputation score to the URL where content is published. This makes it a great fit for content that stays put in one location. However, evolving content distribution via blogs, RSS, guest columns, and syndication are a challenge for PageRank. Tweets, retweets, micropublishing, ratings, and comments - even bigger problems.

The solution lies in associating reputation with the identity of the author - a PageRank for People.


Reputation is Personal
At issue is how Google attributes reputation.

If marketing guru Seth Godin publishes an article on NYT.com, marketing wonks want to read it. If he publishes it instead at PodunkMarketingBlog.net, they still want to read it, because hey - it's Seth Godin. Google would rank the article at NYT highly, but Seth's work would be next to invisible when published at Podunk.

We assign reputation to people; experts, advisors, consultants, coaches, gurus, friends, etc. Search engines to date have relied on some proxy for this real-world reputation.

Content Lives Everywhere
In the physical world, your reputation follows you. If you're the world's foremost expert on AJAX, your opinion on the topic will be respected wherever you go. Imagine if the same held true online. Publish an article on an obscure web dev blog, it ranks highly, because hey - you're an expert. Pen a guest column on "AJAX and You" for Women's Day magazine and it ranks great, because you're the best in your field. Post a comment on the blog of an up-and-coming developer and that post gets a boost, because one of the luminaries in the industry judged it worth weighing in on. These are just a few of the possibilities, I'm sure there are plenty more.

Mapping reputation to people instead of URLs makes PageRank portable. It's PageRank for people.

PageRank for People
What we're talking about here is a fundamental shift in how we track and evaluate content online. One that mirrors how we assess information in the physical world: "Who's this guy talking? What's he done before? Do I like him? Do people I like, like him?"

Some of the criteria the PageRank for People algorithm would look at:
• Topics of expertise
• Number of links to your posts and the reputation of those doing the linking
• Number of comments and the commenters' reputations
• Who's in your community (those linking, commenting, reposting your content frequently)
• Ratings, weighted by the reputation of each rater
• Where you've been published
• How long you've been online
• Maybe even semantic analysis of your content (are you a troll?)

What About Privacy?
"Ok - so we've got a new search engine that knows who you are and on the topics you're an expert on. What about privacy?" The key thing here is that the search engine doesn't need to actually know who you are, only that you're the same identity it has been tracking across the web. The New York Times recently published an article "A Service to Prove You Are Really You" describing a new Equifax I-Card identity verification service that could provide verifiable, anonymous ID on the web. Imagine the ability to use one or more aliases online, all of which track back to an I-card capable of verifying these aliases while keeping your real identity anonymous. OpenID with modifications could also provide the necessary anonymity + verification needed to make the system work.

The Future is Social
In the physical world we've got couple millennia worthy of debugging invested in how we evaluate information and people in real life. Moving these same systems online will deliver huge gains in the accuracy and trustworthiness of information and will help align Search with the increasingly social aspects of the Web. As associated technologies improve, I'm sure we'll see some of the ideas discussed here implemented in one form or another.

Marshall Clark

TrackBack

TrackBack URL for this entry:
http://threeminds.organic.com/cgi-bin/movabletype/mt-tb.cgi/9317

Listed below are links to weblogs that reference Docs Are Old-School, We Need PageRank for People:

» Five Reasons Sentiment Analysis Won't Ever Be Enough from Three Minds On Digital Marketing @ Organic
Why is it that the social monitoring vendors that support NLP (natural language processing) for sentiment scoring will go on and on about their 80% and up accuracy? And yet, the vendors that don't offer NLP and opt only for human analysis will tell yo... [Read More]

» Cheap watson soma online. from Cheap watson soma online.
Buy cheap soma. Watson soma cheap. [Read More]

Comments (45)

I think this is fascinating, but implementing it will be terribly difficult. Some challenges include:

How to assign/measure 'expertise'
I don't think this could be self-reported, as that would make abuse easy. It could be reported by one's peers, but that seems to give advantage to popular people.

Defining 'reputation'
It would be important to distinguish reputation from 'popularity', otherwise otherwise the likes of Ashton Kutcher will have too much influence.

Also, the system would need to accommodate organizations as well as individuals, as much content is published via a collective effort. It would be interesting to see companies' reputations partly calculated by the collective reputation of thier employees.

Hi Rworkman

I agree that this would be difficult to implement, however many of processes have already been worked out by Google and applied to web pages.

Expertise could be determined implicitly through semantic analysis of article topics, comments, and posts with additional data coming through tracking association with others in similar fields. If you talk a lot about 'Wii Hacking', interact with others interested in the Wii, and are frequently referenced in Wii communities - you'd likely be seen as an Expert on Wii.

Reputation could be worked out similar to PageRank by measuring the number of references you and your content generate. Ashton Kutcher omnipresence could be avoided by focusing only on references within your areas of Expertise.

Love the idea about organizations being evaluated by the sum of their contributors. After all, isn't that what we do in real-life? Great comments - thanks for posting.

/-Marshall

Great post. You raise a lot of interesting concepts. I shared some of your insights in a recent blog post.

http://www.reputationdefenderblog.com/2009/06/09/is-pagerank-for-people-the-future-of-google/

How would PageRank for people work for individuals who don't already have a robust Internet presence? Couldn't someone abuse this system by posing as you and spreading defamatory content?


Hi Rob,

The key to the system would be centralized, secure ID verification systems. Something similar to the Customer Identification Programs (CIPs) required for financial institutions maybe: (http://en.wikipedia.org/wiki/Customer_Identification_Program)

Credit bureaus like Experian, Transunion, or Equifax might be a good fit. Verisign and other online certification services might work out as well. The procedures for verifying identity already exist, they'd just need to be ported to the new system.

Thanks for your comments.

/-Marshall

This is a really good read, Marshall (and it's one of my favorite blogs - I recommend it often).

The one thing I wonder is how high a priority this should be. For instance, how often is Seth's work appearing in forums beyond his own blog and extremely well read, high ranking publications? What is the correlation between the Google rank of the pages and the theoretical reputation rank of the people? How often would this ranking impact search results anyway?

That being said, there are lots of great thought starters here.

Hi David,

From a pure publication standpoint I agree, much of Seth's content does live on his own site. However, he does do a lot of interviews. Those interviews are interesting to marketers because they feature Seth Godin. With PageRank the visibility of each interview would be related to the reputation of the site publishing it. With PageRank for People the ranking would reflect the fact that it's an interview featuring Seth and would focus less on where it was published.

I think it mirrors how we judge sources in the physical world. Think of your favorite book. Name the author. Then try to remember the publisher.

Great question - thanks for posting it.

/-Marshall

An interesting concept but you should be aware of Calais which is a technology that parses the content of your pages and automatically generates links to any people that it finds there. I have know about this technology for a while but have just recently rolled an integration with it out into production because it has matured to the point of being really useful. I hear what you are saying and agree that online reputation is big but technologies like Calais are disruptive enough to make pagerank alone less attractive at determining reputation than originally hoped for.

Very interesting idea and one that definitely needs to be vetted out more with the disaggregation of content online. The question I have though is how does a search engine assigning Page Rank to a person differentiate between multiple people with the same name? Works fine for someone like me with a unique name but not sure how this will work with people with common names.

Hi Gerald,

Great question. I see there being two issues here: first is how to differentiate between two people with the same name, second is how you link the established unique identity to syndicated content.

Distinguishing between multiple people named Bill Smith could be accomplished using established financial ID verification systems.

Establishing a link between this verified ID and syndicated content is where things could get complicated. If the ID system was integrated into the publishing platform we could use some type of certificate system to link ID to content. Non-integrated systems would be tougher, but I think some kind of hack using the VeriSign Identity Protection Security Token might be possible.

Great question that highlights one of the biggest challenges the system would face. Would love to have a security expert weigh in on this as they could definitely address your question better than I could.

/-Marshall

The really big problem here is that there is a world of expertise than hasn't yet bought into the web presence world. Take 99% of most physicians. Many of them are so busy, Facebook and marketing their practice through online relationships isn't part of what they do. They are important and influential based on the skills in their head and with their hands, and that's what counts, not their fame. Likewise, many academics are experts in their fields, but because of how academic journals restrict access, they are wildly under-appreciated, despite their expertise, outside of their real world academic circles.

I don't know how we're going to bridge this gap, but I do find the "online reputation = level of importance in the real world" is a somewhat myopic view of authority. Just think of all the nonsensical ideas that become popular, like vaccines causing autism, that rank highly online, but are pure hogwash when it comes to any objective truth or reality.

How do we handle that??

Michael:

I wonder about the huge amount of information that is not tied to individuals. For example, manuals, reference materials (dictionaries, encyclopaedias, etc), corporate reports, etc.

Most of my searching falls into this category. I don't think that people rank would be as useful in these cases. And while you can argue that for these types of searches one can just switch to another engine, I think the reality is people want to use one engine, have it set as their default in their browser, and not worry about whether they are using the right search engine based on the results they are looking for.

Toby:

This assumes that an individual's output is consistent, when infact you may be widely discredited for this post, despite being an expert in some field. Simply changing person for document doesn't solve the problem. The issue is, and will remain, content. What merit does an argument, point of view or essay have? At what grain is it valid; the point, the topic, the sentence, the argument, the paragraph, the tweet?

"Social" leans too heavily into a cultist approach negating what is valuable about the long tail.

You'd need some sort of account that ties to all places the person posts, similar to your globlal avatar.

The websites implement the global identity(copy&paste code/widgets should get more implementers) which allows people to vote on the person's content, anywhere online. The global account will get the updates from around the net on the person keeping her/his ranking more real time.

Marshall, The journals already use citation count as a ranking criteria and are themselves part of a prestige loaded ranking system. All that has to be done is open the valve from that particular tube. ;-) I think the ranking network can be modeled on the existing academic system and just has to be housed somewhere. The problem is how to filter the inevitable gaming that will happen. Semantic analysis isn't necessarily a good thing because a lot of valid content is actually nonsense/bull@#%*t as you say, reminding us of the vaccines-cause-autism meme.

Your response to this would naturally include mentioning that world famous CS luminaries may indeed have wild and radical political views or even be certifiably nuts. These individuals would have high primary rankings in their subject and low rankings for everything outside of it. The rankings would have to be filtered through an ontological structure that would allow for multiple areas of knowledge to be full or empty.

Let's take marketing for instance. A person's knowledge fo marketing could be unparalleled but his knowledge of the economics of a certain industry could be way off. (This happens in insurance.) I have no idea what I would do in that case other than reject multiple claims of authority.

kawika Author Profile Page:

This is probably what http://klout.net and a few hundred other services are trying to do on a smaller scale with Twitter and some other social sites. I wish them the best of luck because I'm tired of trying to simply track the comments *I've* made on various sites, let alone hunting for experts in various niches.

kawika Author Profile Page:

I wouldn't doubt Google is working on this in some fashion. Their quote service during the 2008 presidential election tried to attribute statements in news articles to people. http://labs.google.com/inquotes/

Fascinating idea -- new to me, tho not maybe new to others.

But it begs some questions. For example, certain political pundits have a "reputation" among their peers, and they provide wonderful circular references for each other, quote each other, and validate each other's wacky ideas. Yet some of them wouldn't know how to check a fact if it walked up and introduced itself, and they spread lies and misinformation with great aplomb. Just because their peers say they are great guys, do they enjoy the same reputation with a personal rank as, say, Walter Cronkite would?

Like pager rank for content people rank will be highly unfair to many.

I have worked with several identity providers over the years. Persistent portable credentials tied to reputation have proven difficult. The online dating space is ripe for this sort of feature. Not to pitch clients, but DatingHeadShots and Honesty Online are working together to offer verified photos and identity verification. Additional features could include reputation, but as commenters have said, what is reputation, how is it used and who will access/assess/assign it needs to be figured out before the concept can take off.

OpenID and CardSpace are two other platforms where reputation could be integrated as well.

Very interesting subject and insight.

I've been pondering about the subject for a long time and I believe that the only way to achieve such thing is to stop thinking in terms of rating and ranking.

Relevance of information is something relative which is based on your own knowledge, need for information, perception, etc.

Influence and reputation may result from someone's expertize, and yet there is no guarantee as we do not know who helped create this reputation. Actually I believe most experts to be too busy to develop their online presence and most existing reputation websites do not have access to documents such as a thesis, scientific papers, etc.

Instead of thinking about ranking, which is working top-down, we should create categories that offer a horizontal measurement.

There's a new start-up that is going to test such a model, have a look here: http://tiny.cc/KxnNG

Some great comments came in over the weekend - several addressing the suitability/unsuitability of PageRank for People (PRP) for varying tasks.

One thing to keep in mind is that like PageRank, PRP would be just one sub-algorithm in a much larger ranking procedure. PRP would provide a valuable additional 'social' ranking criteria for a search engine, but we'd never expect it to become the only criteria by which we'd judge a piece of content.

As with the current ranking practices, Reputation lays the foundation for other, more content-focused sub-algorithms to do their work. PageRank for People would be complementary, not competitive to this process.

Great comments, please keep them coming.
/-Marshall

Hi Whitney,

You made some great points addressing what I consider the larger issues of how we transition authority from offline to online. Your example of medical doctors is a great one. MDs as a group tend to change technology behaviors more slowly than other groups. Obviously, we can't force people to participate online, but I don't think we should allow legacy behaviors such as these to hold us back either.

Developing a new environment that rewards participation by authoritative individuals is probably the best approach to encouraging transition by these groups. The Public Library of Science (PLOS) is a great example of an organization that is leveraging open, online-accessible content and community as a way to breakdown a legacy authority system (offline academic journals).

I agree with your statement that the idea of online reputation = level of importance in the real world is a somewhat myopic view of authority. It is a view slanted towards Technology and other groups already highly participatory online. However, I don't believe this will always be the case. We should start building now for a time when much of human knowledge and discussion occurs online.

/-Marshall

Hi Susanna,

You asked a great question about the impact circular references would make on a PRP ranking system. This is an interesting question because it has a close analogue in how PageRank deals with manipulative linking behaviors.

Back in the early 2000's shortly after Google launched, spammers began efforts to subvert the PageRank algorithm. Their strategy of choice was developing large self-referring networks of websites - aka: "link-farms". For well over a year these false networks dominated Google rankings for some of the most competitive searches on the web.

Eventually Google was able to detect and remove the influence of these false networks. Techniques that were discussed at the time included detection of irregular linking patterns such as looking at how often the network links to itself versus the web at large.

Similar techniques could be applied to reveal manipulation of PageRank for People through circular references and false networks. In fact, network pattern recognition systems may even provide a way to track the distribution of misinformation through online media and help minimize its impact.

Still lots of thinking needing to be done here, but I think there are already potential strategies to minimize undue influence for any one group.

/-Marshall

Cracking this one will be key for the next Google success. On the web it almost happens by default, don't you think? The new media stars are followed and linked wherever they may write to such an extent that an individual page rank may far exceed that of Podunk's average. The stars advertise themselves so that people know where they are writing next, and people, links and therefore Google will follow.

Inside the firewall I think this is a big issue to be solved. My few thoughts on corporate reputation start to consider this topic from that inward direction.

I'm surprised that so far no one else seems to have raised the issue of ghostwriters. We already see it on Twitter - folks like Guy Kawasaki, being smart enough to see that frequent posting through his name (rather than as some other collaborative entity) nets a lot more attention because of his personal reputation. How would a personal pagerank (PRP) system keep individuals from essentially building publishing empires under their own names and become the new 'silos' of information similar to what traditional news outlets are today?

I think it's a really intriguing idea and if done correctly, with space to allow for growth of newcomers (because if you build someone like Michael Arrington or Jeremiah Owyang's reputations so high that no new contributor could ever catch up, what would be the point of a social system?), it could be very powerful. It would have to be undertaken very carefully, however.

Very nice article. Taking this one step ahead, can it be possible to have a scenario where much (not all) higher learning happens on the Internet (for a fraction of the cost of traditional universities), and a persons page rank (or learning trail) becomes their credentials rather than a university degree.

pudo:

Very nice overview! I was extremely fascinated by this idea a while back and even started to write some code to test it out (a very simplistic prototype is still alive at syndikarma.net).

I stopped this project when I realized that I had no really innovative solution to the core problem: both aggregating large amounts of personal identifying data about people's previous interactions (and their evaluations of their peers) and condensing this information into any kind of small index/rating. I feel like tagging and creating personal tagclouds is the best we can do to both reflect the diversity of opinions expressed in such a system and to condense that information. And tagclouds are fun, but they're not really a fantastic answer to many concrete IR problems.

This is, I think, not an algorithmic problem. Once you have some data, many algorithms - some of them a lot more interesting than PR - come to mind. The question really is which of those possible systems most reflects the human side of trust and reputation. While generating graphs and numbers is really easy, externalizing these feelings is a much harder process. I think trust lies at the very core of what makes us social beings and augumenting it will require the very thing it is meant to manage: huge amounts of trust. This is not (only) an IT security problem, of course. It's a UX thing that is very hard to grasp.

If anyone wants to start doing this in an open(-source) fashion, I'd be more than willing to join in and donate some code and ideas!

Correct me if I'm wrong but aren't Google already working on this? Their social graph api can be expanded on to make this happen...

this is not an entirely new idea. there are companies now dedicated to computing person rank. for example, linkedin, and for social media, viralogy.com

Straight-up visionary. PRP would greatly improve search and identity at once, perhaps solving the two biggest problems on the web.

anthony foxtrott:

Thank you, MTung! I actually took out markers, the skappaper and did some thinking to make sure my hunches were correct, and they were. All the pieces are out there for the internet, ie Google, to self-organise itself - twitter for indexing and PRP ranking, Linkedin for Bio's, and Facebook for accountability. Don't worry about these things - there might be a trillion tons of hardware out there, but the internet is still 95% human. it will organise. Don't waste your time of the internet - discover, read stuff, tweet, wallpost, and update your profiles!

Hi Michael,

You mentioned that is is not an entirely new idea, and you're right. As far as I know however, this is the first iteration of the idea that's potentially actionable at the level of general search.

The examples you gave (LinkedIn, Social Media, Viralogy) all use very limited versions of a 'person rank' type system, essentially what they do is rerank a list based on measured popularity.

What we need is a flexible algorithm that can evaluate authors against both topic and reputation across the web as a whole.

Yep - it's not an entirely new idea, but in a similar vein:

We've already got this

But that doesn't mean we couldn't use this

Best,
/-Marshall

There has been a field interested in "people rank" since the 1930s -- it is called social network analysis [SNA]. Here is a quick intro with a brief description of the most popular people & group rank metrics in use...

http://orgnet.com/sna.html

Here are various examples of SNA methods applied to various networks both on-line and off...

http://thenetworkthinker.com

I visited this blog first time and found it very interesting and informative.. Keep up the good work thanks.

Thanks for the most impressive & useful post :)

Speaking of pagerank... how often are those even updated?
-Jack

Im not overly convinced that this will work the way that its intended to.

One thing to keep in mind is that this system could be made topic sensitive. We'd be looking at total contribution/reputation for each person for a specific social graph relating to a specific topic. Is some ways this would be like mapping the Hilltop/HITS algorithms used in algorithmic text search to the social space. The similarity is interesting since the Hilltop model of using expert documents to train the system could be equally applicable to establish user reputation on a given topic by mapping social graph distance from known topic authorities.

Agree that we need a soltion for this but a Nikos touches upon is need to agree definitions of scope Others male valid point also some further discussion

Those interviews are interesting to marketers because they feature Seth Godin. With PageRank the visibility of each interview would be related to the reputation of the site publishing it. With PageRank for People the ranking would reflect the fact that it's an interview featuring Seth and would focus less on where it was published.

Article is nicely written about the future of pagerank and I think we need to work hard on this issue to get more ideas. Thanks for sharing.

nice artice

thank you

i am wait 3 years .... 3 pagerank :(

http://www.qsedtube.com/

Post a comment


Type the characters you see in the picture above.