THE World Wide Web is awash in digital video, but too often we cant find the videos we want or browse for what we might like.
Thats a loss, because if we could search for Internet videos, they might become the content of a global television station, just as the Webs hypertext, once it was organized and tamed by search, became the stuff of a universal library.
What we need, says Suranga Chandratillake, a co-founder of Blinkx, a start-up in San Francisco, is a remote control for the Webs videos, a kind of electronic TV Guide. Hes got just the thing.
Videos have multiplied on social networks like YouTube and MySpace as well as on news and entertainment sites because of the emergence of video-sharing, user-generated video, free digital storage and broadband and Wi-Fi networks.
Today, owing to the proliferation of large video files, video accounts for more than 60 percent of the traffic on the Internet, according to CacheLogic, a company in Cambridge, England, that sells media delivery systems to Internet service providers. I imagine that within two years it will be 98 percent, says Hui Zhang, a computer scientist at Carnegie Mellon University in Pittsburgh.
But search engines like Google that were developed during the first, text-based era of the Web do a poor job of searching through this rising sea of video. Thats because they dont search the videos themselves, but rather things associated with them, including the text of a Web page, the metadata that computers use to display or understand pages (like keywords or the semantic tags that describe different content), video-file suffixes (like .mpeg or .avi), or captions or subtitles.
None of these methods are very satisfactory. Many Internet videos have little or obscure text, and clips often have no or misleading metadata. Modern video players do not reveal video-file suffixes, and captions and subtitles imperfectly capture the spoken words in a video.
The difficulties of knowing which videos are where challenge the growth of Internet video. If there are going to be hundreds of millions of hours of video content online, Mr. Chandratillake said, we need to have an efficient, scalable way to search through it.
Mr. Chandratillakes history is unusual for Silicon Valley. He was born in Sri Lanka in 1977 and divided his childhood among England and various countries in South Asia where his father, a professor of nuclear chemistry, worked. Then he studied distributed processing at Kings College, Cambridge, before becoming the chief technology officer of Autonomy, a company that specializes in something called meaning-based computing. This background possibly suggested an original approach to search when he founded Blinkx in 2004.
Mr. Chandratillakes solution does not reject any existing video search methods, but supplements them by transcribing the words uttered in a video, and searching them. This is an achievement: effective speech recognition is a nontrivial problem, in the language of computer scientists.
Blinkxs speech-recognition technology employs neural networks and machine learning using hidden Markov models, a method of statistical analysis in which the hidden characteristics of a thing are guessed from what is known.
Mr. Chandratillake calls this method contextual search, and he says it works so well because the meanings of the sounds of speech are unclear when considered by themselves. Consider the phrase recognize speech, he wrote in an e-mail message. Its phonemes (rek-un-nise-peach) are incredibly similar to those contained in the phrase wreck a nice beach. Our systems use our knowledge of which words typically appear in which contexts and everything we know about a given clip to improve our ability to guess what each phoneme actually means.
While neural networks and machine learning are not new, their application to video search is unique to Blinkx, and very clever.
How good is blinkx search? When you visit blinkx.com, the first thing you see is the video wall, 25 small, shimmering tiles, each displaying a popular video clip, indexed that hour. (The wall provides a powerful sense of the collective mind of our popular culture.)
To experiment, I typed in the phrase Chronic WHAT cles of Narnia, the shout-out in the Saturday Night Live digital short called Lazy Sunday, a rap parody of two New York slackers. I wanted a phrase that a Web surfer would know more readily than the real title of a video. I also knew that Lazy Sunday, for all its cultish fame, would be hard to find: NBC Universal had freely released the rap parody on the Internet after broadcasting it in December 2005, but last month the company insisted that YouTube pull it.
Nonetheless, Blinkx found eight instances of Lazy Sunday when I tried it last week. By contrast, Google Video found none. Typing Lazy Sunday into the keyword search box on Googles home page produced hundreds of results but many were commentaries about the video, and many had nothing to do with Saturday Night Live.
Blinkx, which has raised more than $12.5 million from angel investors, earns money by licensing its technology to other sites. Although Blinkx has more than 80 such partners, including Microsoft, Playboy, Reuters and MTV, it rarely discloses the terms of its deals. Mr. Chandratillake said some licensees pay Blilnkx directly while others share revenue and some do both. Blinkx has revealed the details of one deal: ITN, a British news broadcaster, will share the revenue generated by advertising inserted in its videos.
For all of Blinkxs level coolness, there are at least three obvious obstacles to the companys success.
First, because Google Video is not much good now doesnt mean it wont get better: after all, when Blinkx was founded, it first applied machine learning to searching the desktops of personal computers, a project that was abandoned when Google and Microsoft released their own desktop search bars.
Second, even if Google improbably fails to develop effective video search, the field will still be crowded: TruVeo, Flurl, ClipBlast and other start-ups are all at work on different subsets of the market.
Finally, Blinkx might not go far enough in searching the content of videos: the company searches their sounds, but not their images.
THIS last objection is the most serious.
Because Blinkx emphasizes speech recognition, there is a great amount of multimedia content that they cannot address, like photographs, said John R. Smith, a senior manager in the intelligent information management department of I.B.M.s T. J. Watson Research Center in Hawthorne, N.Y. But whats worse, speech is not a very good indicator of whats being shown in a video.
Mr. Smith says he has been working on an experimental video search engine called Marvel, which also uses machine learning but organizes visual information as well as speech.
Still, at least for now, Blinkx leads video search: it searches more than seven million hours of video and is the largest repository of digital video on the Web.
Search is our navigation, our interface to the Internet, said John Battelle, chief of Federated Media Publishing and author of The Search, an account of the rise of Google. With Blinkx, we may have such an interface for digital video, and be a little closer to Mr. Chandratillakes vision of a universal remote control.
Jason Pontin is the editor in chief and publisher of Technology Review, a magazine and Web site owned by M.I.T. E-mail: pontin@nytimes.com.
This problem is firefox specific. For whatever reason, IE does The Right Thing. Actually, the whole site just looks better in IE, so maybe this should be a request for better Firefox support :)
50 percent of their traffic is for their mail service at this point, according to Alexa. If they ever get beaten on that, it'll be lights out for them.
See also The Bootstrapper's Bible by Seth Godin. Seth has a much better explanation of the benefits of bootstrapping than anything else I've seen. The eBook version is only three bucks on Amazon too.
I find myself marking up comments of the same 2 or 3 users more often than others. They don't have ultra-high karma or anything- they just are interested in the same articles and discussions I am. It would be nice to learn more about them.
I pointed out in an earlier article (http://m4th.com/Articles/Article.php?Article-Title=Anatomy-of-a-Successful-Social-Network) that MySpace owes much of its success to the countless choices it offers to its users. Over the past couple of months, however, MySpace has turned greedy. Rupert Murdoch feels that online widgets are a zero-sum game; in other words, widget companies make profit at the expense of MySpace. This couldn't be further from the truth; the fact is that widgets complement MySpace by giving its users the choice to decorate their pages anyway they want. By restricting access to these widgets, MySpace will not only frustrate the users but also generate unprecedented negative publicity. - Jawad Shuaib
All I know is that it works. I tried out a few terms and got what I had in mind every time.
They heavily emphasize speech recognition, I think. For what this is, it's very cool. The technology is there and the product works. I think this is going places.
As far as I can see, there are two tar pits that Digg and now Reddit are stuck in:
1. A lack of focus and quality in the content.
2. No troll guards.
1. Lack of focus and quality
In my experience, users frequent a site because it has quality content and they leave when the quality of the content declines. Digg and more recently Reddit, are experiencing a loss of focus and quality and as a result are losing their initial users. Diggs quality is so bad it is now pointless to read and much to my chagrin, Reddit seems to be following suit.
Reddit seems to be drowning in a rising tide of noobs. Apparently, there arent enough old users around to down-vote the crap posted by the noobal hoard. From a quick read of comments, it seems many long-time users are angry and feel disenfranchised. Its because of this that those users whose content made Digg and Reddit popular in the first place are now leaving those sites and taking their great ideas with them.
2. No troll guards:
Nothing poisons an online community quicker than a few nasty trolls. Another one of the reasons that Im pulling away from Reddit is because it is getting mean. Both the links that are posted and the article forums are being destroyed by trolls stomping around unchecked. I hope Reddit can fix this problem. If not, Im going to stop spending my time there.
The impression that I get, Paul, is that your goal is to make this YC News a start-up news site and a community of potential founders; not simply another social news site. The only way that I can see to maintain quality content and to filter out the trolls is to institute some form of moderation. Straight democracy leads to anarchy; thats why I think a news site needs to be a republic.
I dont think, by any stretch of the imagination, that Slashdot is perfect, but they do have a system where moderators are selected from heavy and moderate users on a rotating basis. The system filters out new and spam accounts and gives preference to high karma users. It seems to keep the trolls in check. It also encourages people to take more ownership and to participate in the community.
Slashdots FAQ explains their moderation system here:
http://slashdot.org/faq/com-mod.shtml#cm520
There is also a brief discussion of their anti-troll rules here:
http://slashdot.org/faq/com-mod.shtml#cm2000
Thanks for setting up the site. It scratches an itch that Ive had for a while.
Of course it'd be ideal to create a company with a sustaining business model; there's no question about that.
Is that the only time a company should be formed? That may be akin to only purchasing shares in long-term growth companies. Sometimes it's in your best interest to just ride the short-term explosion and move on.
It's completely viable to get into something with only a foreseeable immediate market. After a certain point, perhaps the company would be better off taking advantage of the economies of scale. It isn't a problem to be true to yourself and realize early on that an acquisition is the best end-target for your new company.
Yeah, it's probably not so applicable to consumer facing internet companies since the approach really needs a business to buy into the idea (much more willing to preorder, a single sale could fund the entire development).
Although maybe that depends on what you are creating. I could see something like Hotmail or Skype being bootstrapped with this method, because those services are useful to both individuals and businesses. Back then, email from any PC with the internet or free international voice calls would have been things that many businesses would have paid for (and still would, if there weren't so many free alternatives now). Even Reddit got its NYT deal soon after starting which pushed it into the black, did it not? (not rhetorical - I don't know much about their history so tell me if I'm wrong!)
There is the problem of loss of focus though. If a company like Hotmail started in '95, realised it could make millions selling their product to companies and focused on that (sales, turnkey servers for easy installation), they would probably only get a few years of that income at most before they are dethroned by a company that just focused on making the best web based email possible. So it would be a bit of a local maxmimum. Is this more dangerous than the loss of focus of doing something totally unhelpful/unrelated to the main product to bring in some cash (contracting, searching for investors) while simultaneously working on the final product? I'd like to know what everyone thinks.
In any case, a very useful feature would be a way to track your comments in the different submissions and the stories that you voted up.
There are a lot of really great stories on here and sometimes I don't have the time to finish reading some. I'd like to be able to find the stories again quickly in my recent history.