May 30, 2004

Lushe Power Search

LusheLUSHE -- Say you have hundreds of bookmarks that support an academic journey and that by visiting those bookmarks you will reach a higher level of understanding about sophisticated concepts. You could read and revise and learn. Or you could ask yourself questions in Google and let trusted authorities help form the answers. For that, you need Lushe.net. It allows you to easily build a list of your favourite sites, and then search only those sites using Google. This allows you to only search sites relevant to you and your interests.

So you could set up Lushe just to search the A-list bloggers. You populate your personal lushe cache by pressing a button in your links bar, thus adding a site to your list or access the search functionality. This is all done without leaving the page you are browsing, making things as simple and as easy to use as possible.


Glen Murphy -- Lushe and other projects
x_ref125ws

May 30, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

May 10, 2004

chongqing

CHONGQING -- I am doing my part in fighting the comment spammer who deposits lists of unwelcome links across my Typepad blog. The spammer wants to elevate a client's standing in Google. By linking this page to a spam fighter, I'm damaging the search engine standing of the spammer. Within a fortnight, this blog item should appear among the first 10 results for chongqing.

The irritating spammer, possibly from emmss dot com, spews his chongqing link all over his comments. The spammer is defacing the CSS-Discuss wiki and several other quality sites.

We fight this comment spammer by linking to the anti-spam chongqing sites. These links will demote pages that link to emmss.

Every little bit helps.


Joe -- "chongqing Googled"
more chongqing -- "spam chongqing"
Chongqed -- "All your page ranks are belong to us"
x_ref119

May 10, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

Where's the RSS

BLOGGER -- The GooglePlex team have brushed off a revised look for Blogger but an important bit is missing--it does not support RSS 2.0 and that's a very significant shortcoming. I depend on RSS for my major stream of information, even more than I depend upon Google to find arcane information.

Cue Dave Winer:

It was disappointing that the new Blogger interface, which looks quite nice, doesn't support RSS 2.0. I'm far from the only one who's commenting. It would be so easy to do, so not evil, so grown-up, so much appreciated if they would just do it.

I teach multimedia degree students how to leverage the knowledge they find on the Internet. RSS is part of this knowledge management challenge. Google has to know this is and respond to consumer expectations. Are you listening, David Krane?


Blogger -- start here
Dave Winer -- "Contact with Google"
John Robb -- "a new Blogger interface is out."
Phil Ringnalda -- "Breaking the world of Syndication"
x_ref119

May 10, 2004 in Search Engines | Permalink | Comments (4) | TrackBack

May 05, 2004

Worldwide Buzz for VoIP

PULVER -- Around 90 days ago, Jeff Pulver checked for "VoIP Buzz" and found there were 2.3m hits on the keyword "VoIP" and 2,290 current news stories for VoIP. Today, there are 4.01m hits on the keyword and 2,880 current news stories. That suggests interest in VoIP is growing at a rate of 250,000 per month.

As Pulver suggests, there is a college research paper in correlating the buzz to the market. "An interesting added analysis would be to track some of the public pure-play VoIP companies and their market valuations as effected by the relative VoIP Buzz. " General Internet buzz can translate into value propositions.


Jeff Pulver -- "Worldwide VoIP buzz grows"
x_ref119 x_ref125ws x_ref26121

May 5, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 30, 2004

google.com/blog

GOOGLE -- "Test" is the single word that appears at Google.com/blog. This teaser appears on common derivatives of Google, such as www.googlr.com and www.466453.com (URLs we use to circumvent border management). But it does not appear in the SEC filing by Google.


Fintan Friel -- "I have a test blog and now Google has one too. Plus they have my mail."
John Battelle -- "Now that the other shoe has dropped"
Chris Gulker -- "Search Google for the world's leading web search engine"
Amy Harmon -- "Is a do-gooder company a good thing?"
x_ref119

April 30, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 29, 2004

Mason on Gmail

TAINT -- Justin Mason played around with Gmail and makes a very interesting observation: "GMail does not create HTML mail -- all mail composed through their composer is sent as text/plain only. This is very interesting, because it suits me just fine. HTML mail causes so many more problems than it solves, especially when full-featured web browser components are used to display it, IMO. I get to see the security exploits this enables, every day in my anti-spam work. But it's also very significant that nobody else has commented on it -- nobody misses it!


Justin Mason -- "More thoughts on Gmail"
x_ref119

April 29, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 24, 2004

Chastity belt around Google

UNDERWAY -- I work with a team of censorware afficionados who are trying to develop a rule base that allows Internet access with tighter restrictions on academic grazing than exists in most seminaries. The problems arise in the definition of "acceptable usage." The definition appears to extend to all MP3 files, all translations rendered by Google, any cached page from Internet space, known gaming sites, and gay enclaves. My predicament is not unique, as Declan McCullagh reports.

PartsExpress.com proudly touts itself as the Net's No. 1 source for audio, video and speaker components--but online shoppers who rely on an optional feature in the Google search engine to block porn sites would never know it.

By an accident of spelling, the domain name of the Ohio electronics retailer includes an unfortunate string of letters, "sex," which is enough to block the Web site from Google's filtered results.

PartsExpress.com is not alone. A CNET News.com investigation shows that Google's SafeSearch filter technology incorrectly blocks many innocuous Web sites based solely on strings of letters such as "sex," "girls" or "porn" embedded in their domain names.

Google's SafeSearch flaws are more than academic--they can have serious consequences for innocent Web site operators blocked out by them. Google is the most widely used search engine on the Web, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets.

Research company WebSideStory reported last month that Google claimed an all-time high in search referrals, 41 percent of the United States total, and the search giant's market share is steadily expanding.

"Traffic from Google can make or break a business," said Maria Medina, whose family-run clothing business at ALittleGirlsBoutique.com doesn't pass the SafeSearch censor. "Here I am, a mom of four children, creating an at-home business that sells little girl dresses and accessories, in order to spend more time with my children, and I have been filtered out as not being family friendly. Ridiculous."

Matt Cutts, the Google engineer who designed SafeSearch four years ago, said his algorithm looks for a "relatively small" number of trigger words in a Web page's address. If one of those words appears, the SafeSearch algorithm puts the address on a block list and does not take the next step of evaluating the content of the site. "We try to find the best trade-off of precision, recall and safety," Cutts said. "People who opt in to SafeSearch are mostly OK with us being on the conservative side."

Cutts would not disclose how many Web searches are done with SafeSearch enabled, saying only that it's a small percentage of the millions of queries handled by Google each day. But the sloppy filter stands out as a rare black eye for a company that prides itself on superior search technology and boasts on its payroll one of the world's highest concentrations of computer science doctoral degrees. Google claims SafeSearch "uses advanced proprietary technology that checks keywords and phrases" and filters out only Web pages "containing pornography and explicit sexual content."

"That's not very bright," said Karen Schneider, a librarian who runs the Librarians' Index to the Internet and has made a study of filtering software. SafeSearch is "certainly evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development."

The Scunthorpe problem

For years, Web content filters have drawn criticism for inaccuracies. In a famously embarrassing incident in 1996, America Online's errant dirty-word filter prevented residents of the British town Scunthorpe from signing up as new customers. Google's SafeSearch makes the same mistake, blocking local news sites like ThisIsScunthorpe.co.uk and ScunthorpeDistrictCatsProtection.co.uk, a housecat-adoption site.

Other Web sites misidentified by SafeSearch because of "sex" in their domain names include ArkansasExtermination.com, which claims to offer the "best in termite and pest control." The owner of the business, who declined to give his name, said he was puzzled by Google's categorization: "My brother wrote the Web site. I don't know anything about that."

SafeSearch also marked as unsafe for children JewishSussex.com, a religious Web site; EssexCountyBeeKeepers.org of Topsfield, Mass.; BluesExcuse.SouthBurnett.com.au, an Australian blues band's site; BassExpert.com; and the Anglo-Saxon history site RomansInSussex.co.uk.

Gareth Roelofse, the Web designer of RomansInSussex.co.uk, said his filtering complaints are broader than just Google. "We also found many library Net stations, school networks and Internet cafes block sites with the word 'sex' in" the domain name, Roelofse said. "This was a challenge for RomansInSussex.co.uk because its target audience is school children."

"I think it would be nice if Google would have a 'white list' for sites like ours, but this would involve human man-hours, I guess," said Roelofse, who designed the site on behalf of the Sussex Archaeological Society and local museums.

Cutts, the Google software engineer, noted that the SafeSearch Web page permits visitors to contact the company with complaints. "In most cases it's a pretty unambiguous usage," Cutts said about the word "sex" in domain names and Web addresses. "No filter can be 100 percent accurate. We're always willing to take a fresh look at our filter and see how we can improve it."

Google is not alone in seeking to lure searchers worried about encountering online raunch and ribaldry: Yahoo offers a "mature Web content" search filter, and Ask Jeeves has set up a separate Web site for kid-friendly searches. But Yahoo's filter isn't as hypersensitive as Google's, and lists domains mentioning Sussex, Essex and Scunthorpe as acceptable.

The flaws in Google's filter have persisted despite research published about a year ago that highlighted overblocking in SafeSearch.

An April 2003 report from Harvard University's Berkman Center described similar but less extensive problems with SafeSearch. That report said some news articles and political Web sites were filtered.

David Drummond, Google's vice president for business development, said that at the time of its development, SafeSearch was designed to be overly cautious. "The thinking was that SafeSearch was an opt-in feature," Drummond said. "People who turn it on care a lot more about something sneaking through than they do about something getting filtered out."

"Plainly silly" blocking

CNET News.com evaluated SafeSearch by testing tens of thousands of random Web pages and identifying which ones were incorrectly listed as pornographic. The results showed that Google encountered many of the same problems that have plagued Internet filters for almost a decade. One 1996 analysis, for instance, showed that CyberPatrol blocked National Rifle Association and gay and lesbian Web sites, and CyberSitter cordoned off Usenet newsgroups such as alt.feminism and soc.support.fat-acceptance.

"None of that surprises me," said Barry Steinhardt, director of the American Civil Liberties Union's (ACLU) technology and liberty program. "The evidence that we put on in the library filtering case shows that it's very difficult to do filtering without being overinclusive, without blocking things that are just plainly silly. That's the reality of relying on blocking: You're going to block a lot of legitimate material."

The ACLU, which has warned against buggy filters since publishing a report on the topic in 1997, unsuccessfully sued to overturn a federal law compelling public libraries to install filtering products.

"In the end, the lists are proprietary," Steinhardt said. "Without access to the lists, you don't know precisely what's being blocked. You have to rely on the authors of the lists to have the right judgment."

The word "girls" also tends to lead SafeSearch astray. It incorrectly blocks the Web sites of the private school GirlsSchoolOfAustin.org; the bridesmaid dress shop DressyGirls.com; TatuGirls.com, a Russian band's site; and TheCalicoGirls.com, a Web site devoted to cat poetry.

"Porn" in a domain name can confuse SafeSearch just as thoroughly. It won't display Pornichet.org, devoted to improving tourism for the French seaside town of Pornichet; SpornGroup.com, a New York-based business consultancy; Sporn.com, which sells dog leashes; PornkRocks.com, a site devoted to the band Pornk; and Anti-Kinderporno.de, a German effort to oppose child pornography.

Aaron Wolfe, information systems director for SafeSearch-banned PartsExpress.com, said the company is planning to excise that unfortunate string of letters from its domain name. "We are going to modify our domain name to Parts-Express.com," Wolfe said, adding that the renaming will also help "get around spam filters on e-mail servers."


Declan McCullagh -- "Google's chastity belt too tight" and another "Report criticizes Google's porn filters" with court action ("Supreme Court to hear filtering case"). All stories on McCullagh's Politechbot mailing list.
Lisa Bowman -- "Court overturns library filtering rules"
Dinesh Sharma -- "Google gets the glory in search engine referrals"
x_ref119

April 24, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 22, 2004

My first GMail

STAR BULLETIN -- Curt Brandao aka Digital Slob offers the world a sample of the first GMail. David Swann, editorial cartoonist, warns "his column is so spot-on funny, a few may think he's serious, which, he kinda is." Paul Boutin says "Curt's the kind of guy whose dream technology is a TiVo that knows when to order pizza 45 minutes in advance."

Hey Phil, thanks for the e-mail. Sorry it took me a while to reply -- I've been very, very busy (download computer solitaire @ MSNGames.com).

Sorry about your office's looming layoffs. I read your attached performance evaluation, but I wouldn't hazard a guess as to how it might affect your chances (You're Fired! "The Apprentice," Thursdays at 9/8 Central on NBC).

The fact that William hijacked that big account you worked on for months, taking all the credit and commission -- that's gotta hurt. But whistle blowing now probably won't help. However, if I were you, just to get closure, I'd take him aside soon and deal with the situation directly ("Kill Bill Vol. 2" opens April 16).

But don't let stress get to you. Call your sponsor before regressing into old addictions -- no matter what that inner demon tells you, answers don't lie at the bottom of a bottle (Obey Your Thirst -- Sprite.com). And don't think you can change your luck by cracking your nest egg over games of chance (Blackjack, Slots, Keno and a $777 free bonus!!! @ carnivalcasino.com).

You don't want to go back to the days when you were so tapped out, getting bus fare home meant selling a quart of blood (plasma TVs with up to $400 instant savings @ gateway.com).

Stay focused. Your daughters are almost grown, but still need your help molding and shaping their future (find a board-certified plastic surgeon @ lookingyourbest.com). And, as I've always said, your wife proves you've got a special eye for the ladies (LensCrafters -- save $75 on our strongest prescriptions).

Sure, every marriage has rough spots. So she's a tad controlling, interrupting you in the bathroom -- at the Sizzler. That's still no reason to stray (view photos of singles in your area NOW @ AmericanSingles.com).

As for floating you a loan, let me think about it. As a rule, friends and money don't mix, even though you say you'll pay me back this time ("Lies and the Lying Liars Who Tell Them," by Al Franken on sale @ amazon.com).

But hey, with a resume like yours, I'm sure you'll get another job at your skill level in no time (earn big money stuffing and mailing envelopes @ PostalMarketing.com).

Anyway, keep in touch. My inbox always has room for you (block e-mail from specific senders with EmailProtect @ contentwatch.com).

-- Curt


Curt Brandao -- "E-mail ads can read between the lines"
Read real Gmail tips in a dedicated blog..
x_ref101tt

April 22, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 21, 2004

Signs of crawling for product

KILKENNY -- Referrer logs entertain me, especially when I can deduce a trend from them. For the past two months, some electronic crawler visits another of my blogs looking for an unfulfilled project that occupied much of my attention last year. The crawler could be part of a due diligence process. That interests me.


x_ref101ds x_ref101tjk

April 21, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 19, 2004

It's a thong

KILKENNY -- I don't go shopping for thongs so when people stop by this blog looking for them, I wonder what I've said. For the past fortnight, I've accommodated a persistent trail of visitors looking at what I wrote about "T-back through T-rati" and for the first week, I thought they sought "trackback through Technorati." Nope. I knew something was suspicious when Daypop did not credit my commentary in its "citations" when it placed the original announcement of the Trackback mechanism in its Top 40. I figured it must have known I read the info first from JD Lasica and since he got the credit, my later info was subordinated. Nope. The light came on when I observed the search engine string attracted more people after midnight than during the extended lunch hour. I followed one query back to source and discovered T-back is an trading name for some bras and thongs.

For the speed surfers unable to click beyond here, you can get T-back attire in Baby-Blue, Baby-Pink, Black, Coral, Flesh, Hot-Pink, Hot-Pink, Orange, Purple, Red, Royal-Blue, White, Yellow-Floral, and Neon-Yellow. If you want lined T-backs, you need to check the "Jelly" pages of reputable online shops. It appears this is something the lads are doing when they land here--looking for well-illustrated product info.

I'm sure their girlfriends appreciate their diligence. Some spider thongs, suspender G-strings, and skimpy tops come with the warning "these aren't for everyone." This is the kind of product advisory that would compel thoughtful boyfriends to look for the most suitable item before committing to a purchase. The Internet is helpful in that kind of way.


Image lifted from Bikini Beach. Note: some T-back bottoms must be covered before entering fast food premises in Mrytle Beach, South Carolina.
Irish Typepad -- "T-back thru T-rati" with [boss alert!] sample.
David Sifry -- comments on Boing Boing adding Technorati Trackback
JD Lasica -- "BoingBoing, Technorati, and conversation in the blogosphere"
Cory Doctorow -- "Boing Boing add Technorati support"
x_ref119

April 19, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 17, 2004

Jew

WIKIPEDIA -- When I was in the military, the installation rabbi was normally the first religious visitor to call around to welcome me to the neighbourhood. He was keying off my name "Bernard Goldbach" ("That's a very good Jewish name, isn't it Marvin?") and ignoring his paper clues that identified me as Catholic-ish. Now there's an excellent online resource that explains everything Jewish.


Wiki -- Jew
Justin Mason -- "Googlebombing in a good cause"
x_ref119

April 17, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 15, 2004

A9 now live

AMAZON -- If you have an Amazon log-on, you should look at A9, the new search engine implemented by Amazon. Although it pulls results from Google's algorithm, it presents no more than half of the strings normally shown by Google. As Chris Gulker notes, "the URL syntax is nice and lightweight, and it remembers your searches" if you're registered through an Amazon login.


Chris Gulker -- "A9 went live"
A9 -- "Seven reasons to use A9"
x_ref119

April 15, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

April 12, 2004

Gmail feature

GOOGLE -- Several discussions about Gmail suggest that Google is perverting the Internet experience by offering a mail system that deviates from being a pure-play browser. When people demean "clever JavaScript tricks" that make Gmail a robust online email client, they often miss an important point. Google knows there is a pent-up market for an alternative webmail client and that market sits behind banks of firewalls and discriminating proxy servers. You have to fool those pieces of technology in order to install and run on desktops. Many of those desktops won't permit their users to install plug-ins to browsers. To get them on Gmail, you give them fast responsiveness, slick keyboard controls, and a clean interface. They don't need a fully-featured web app, functional browser buttons, or real links. They won't care if their pages are coded in JavaScript and not HTML. To these millions of unsigned users, they want something that they can use with IT support desk help. It's grand if it appears to run inside their browser.


Google -- Gmail beta
Mark Pilgrim -- "Gmail accessibility"
Aaron Swartz -- "Gmail update"
EFF -- "Gmail: What's the deal?"
x_ref261519

April 12, 2004 in Search Engines | Permalink | Comments (1) | TrackBack

April 06, 2004

Google's Secret Sauce

DUBLIN -- As Google comes to market, analysts openly speculate how to crawl and index the Internet better and less expensively. Zenark have long speculated how to configure electronic robots to crawl more efficiently than the Google bot. Now some of the developers of the Open Directory Project have dropped interesting information into the mix. It makes interesting reading for anyone visiting Google's server farm in Citywest (Dublin, Ireland).

Stories about Gmail have got Topix.net thinking "about seemingly incremental features that are actually massively expensive for others to match." But is Google's platform actually cheaper to acquire and simpler to maintain than any other large-scale web service?

Topix.net bloggers have written before about "Google's snippet service, which required that they store the entire web in RAM. All so they could generate a slightly better page excerpt than other search engines. "

Google's rise to dominance is a case study in itself.

Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. What is this platform that Google is building? It's a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.

Google has a minimum of 100 racks stacked in Dublin. Friends in Silicon Valley have mentioned the Googleplex in total numbers more than 30,000 machines. Inside information puts each rack with 88 dual-CPU 2Ghz Intel Xeon servers holding 2 Gbytes of RAM and running an 80 GB hard disk. Across the Googleplex, you're looking at more than 2000 terabytes of hard drive space and more than 63,000 GB of RAM. You can store all the Internet crawled by Google in this copious amount of RAM.


Rich Skrenta -- "The secret source of Google's power" with some very lucid weblog comments on Google architecture.
x_ref119

April 6, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

March 29, 2004

Expanding MSN search

SCOBLE -- Robert Scoble shares a few thoughts about emerging MSN search functionality. He talks about wanting to find patterns in people. He wants to find groups with shared interests. Scoble has searched Technorati for several themes, such as "all blogs that mention quilting, and they are in some sort of order based on how many inbound links they have and they bias toward webloggers who mentioned quilting in the past few days."

He wants "a new kind of search engine that combines the full-text approach that Feedster.com uses with the inbound-link analysis that Technorati does." With this kind of reach, he would have predictive results. He would probably find people most likely to talk about quilting in the future.

Imagine if this search functionality was combined with Orkut. If the result set could map to Orkut profiles, including pictures. In Scoble's words, "such a system might get millions of people to give up Outlook's contact system." If Microsoft is doing such a thing, they would have the tools behind the most powerful electronic profiling on the planet.


Robert Scoble -- "a new kind of people search is needed"
x_ref119

March 29, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

March 13, 2004

Garry Trudeau on Google

WIRED -- Garry Trudeau, creator and cartoonist of Doonesbury, says "Google is my rapid-response research assistant. On the run-up to a deadline, I may use it to check the spelling of a foreign name, to acquire an image of a particular piece of military hardware, to find the exact quote of a public figure, check a stat, translate a phrase, or research the background of a particular corporation. It's the Swiss Army knife of information retrieval."


Michael Malone -- "Surviving IPO Fever" in Wired, March 2004.
x_ref119

March 13, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

February 20, 2004

Serving Google Adsense

GOOGLE -- The most common referrer string related to Google on this site is "Google Adsense fraud." Seeing the Adsense programme from the perspectives of a visitor, a client and a microcontent producer, I think Google has clever ways to control Adsense fraud. I need to tweak my Adsense displays to serve them more productively. Based on community response, that means elevating their placement--something I don't think I will do, even though I believe that step would increase my revenue three-fold.

However, the new code for towers, banners, inline rectangles, and single-ad buttons looks worthwhile. And there is a range of Alternate Ads to monetize pages that would otherwise show public service ads. Those Alternate Ads would tick over into an extra packet of revenue for me. I need to ensure AdSense continues to pay for Typepad hosting and for my mail2blog postings. Tweaking the displays should deliver me the right numbers.


Google -- AdSense Quick Tour
This blog is the only place on the Web that Google has indexed for "Adsense fraud."
x_ref119

February 20, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

February 17, 2004

Man's butt on top

CLONMEL HOTSPOT -- I have achieved the pinnacle of search engine placement, proudly planting this blog in the Number One Position for the search term "man's butt" on Google. In accomplishing this feat, I have trumped 275,000 other sites. Wouldn't you be chuffed too?


Google searches for man's butt.
x_ref119

February 17, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

Google's Robot exclusion file

GOOGLE -- Just for grins, it's Google's robots.txt file.

User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalog_list
Disallow: /news
Disallow: /pagead/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /wml
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=


Google -- robots.txt
x_ref119

February 17, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

February 04, 2004

Pinging Yahoo

ZAWODNY -- In Andrew Orlowski's and Paul Boutin's minds,¹ we need to constrain blogs from rampant placement in search engines before they damage the Internet itself. But before they lock down the automatic features of adding URLs through mere pinging when publishing.² You can ping Yahoo on publishing by hitting api.my.yahoo.com/RPC2 with your "Save and Publish" button on any one of the high-end blogging programs.


¹Orlowski writes (whinges) about blogs for The Register and Boutin compiles A-List comments for essays lie "101 ways to save the Internet" in Wired.
²Jeremy Zawodny -- "Configuring MovableType to Ping Yahoo: Three Easy Steps"
x_ref119

February 4, 2004 in Search Engines | Permalink | Comments (2) | TrackBack

January 29, 2004

SEO Trackback

TYPEPAD -- Sniffing around the Typepad Knowledge Base and a few unofficial Typepad discussion zones confirmed something I discovered last month--tracking back to yourself heightens search engine placement in most cases. At the very least, the junior site that receives the trackback will not be subordinated inside Google search engine results, even when the junior page is from the same host URL as the source of the trackback.


x_ref119

January 29, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 27, 2004

Hutton Report Adwords

GUARDIAN -- The BBC is buying up search terms for "Hutton Inquiry" and "Hutton Report" through Google's Adwords service. They would achieve better placement of their Hutton pages by embedding Adsense code on them.


Guardian -- "BBC buying Google Adwords"
x_ref119 x_ref125oj

January 27, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 13, 2004

Mediabot doing the numbers

CLONMEL -- If there's one absolute deliverable to Google Adsense, it's the tenacity of its mediabot. That little crawler finds everything that has an Adsense script attached to it. Then its indexing brothers put all the hoovered information inside the Google search engine. Once there, it seems like the page content is nudged towards the top of the Google search engine itself--pissing off writers like Andrew Orlowski in the process. No matter--I'm consistently getting 1000 page views each day on this blog and it's only five months old. When I bolt Adsense into my legacy content, I will quadruple those numbers in a week's time. And that results in me paying for my hosting charges without any other revenue stream. It's nice when microcontent pays its own way.


x_ref119

January 13, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 11, 2004

Photos and dreams

KILKENNY -- Sometimes you dream about problems that need solutions. I have a little problem where I attract visitors looking for pictures and they land on blog pages that do not contain images. It appears they don't look at the photo thumbnails on the left hand side of the page after they arrive. If the picture is not in view, they continue Googling. There must be something I can do to the photo album cover pages to attract more visitors directly to those pages.

My dreams can get very complex. In the one about the photo album and the search engines, I was visited by someone looking like David Filo and he told me some interesting things.

  • He suggested Google grabs up to four complete Web pages (2400 words) when indexing URLs placed in the top 10 of Google.
  • He said Touchgraph provides a realistic picture of the web neighbourhood used to calculate Page Rank.
  • He showed me ways to optimise word selection by looking at the value of specific Google Adwords.

I don't usually write down dreams, but these kind of techical discoveries deserve follow-up attention, if only to determine whether they have any merit.


Sent mail2blog using Nokia Communicator Typepad service after breakfast scones in Kilkenny at the Castleyard restaurant.
x_ref119

January 11, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 10, 2004

Backpage Google

GUARDIAN -- Within hours of Jack Schofield publishng his article¹ about Google, several people an hour started arriving on my blog by using special characters in their search strings. I saw this on the backpages (referrer logs) of my websites. It appeared they were taking Schofield's perspective to heart because they had refined their searches for product reviews and several dozen each day were using strings of Google operators to focus their searches.

It might be a major leap to credit Schofield with affecting search patterns of thousands of viewers² around the world. Some of the syntax-aware traffic is probably coming from other people who don't read Schofield. Instead, they might read Tara Calishain's Google Hacks where they can see page-by-page steps of improving their search techniques.³

  • Use quotation marks (or full stops) to find exact terms.
  • Use the plus (+) sign before a particular word to ensure Google returns results with that word in it.
  • Use the minus (-) sign in front of a particular word to remove that term from search results.
  • Try the asterick (*) wild card when searching for a phrase.
  • Use the site: command and Google will burrow into specific sites looking only there for the information. You can burrow into Irish Typepad for information written about Google.
  • Use the operators. Calishain dedicates individual exercises to each operator.
  • Try a different search engine: AltaVista (potent in 1997 but indexes less frequently than Google), All the Web (thinks like Google), Vivisimo (inherited Google's minimalism), and Teoma (fewer blogs on top of results).
  • Use metasearchers like Dogpile and Metacrawler to see how sites (authors, critics and reviewers) are related in their viewpoints, perspectives and ownership.

Jack Schofield -- "Delivering the goods" in The Guardian, 8 Jan 04.
²The info on Irish Typepad attracts a minimum of 300 unique visitors every workday and more than 220 unique visitors every weekend.
³If you read the Internet like the stacks of the library, you need to read Tara Calishain's suggestions every week to keep yourself focused. She finds things you would enjoy first-hand.
Check out the Google Glossary, Google Calculator, Google's Freshest Pages, and other Google Tools, like my favourite Touchgraph.
x_ref119

January 10, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 07, 2004

Beyond Google

AP -- As Google's IPO creeps closer, some analysts are pointing out there are options beyond Google. It pays to choose your conclusions carefully as many of the highlighted technology actually plugs into the Google API.

  • Grokker -- downloadable program that sorts search results into categories and maps the results by showing each category as a colourful circle. Within each circle, subcategories appear as more circles that can be clicked on and zoomed in on.
  • Vivisimo -- Clickable directories
  • Touchgraph -- my fav when I'm on the TippInst E1 connection.

Wired -- "Beyond Google"
x_ref119

January 7, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 05, 2004

Searching for a multimedia degree

CLONMEL HOTSPOT -- I have discovered that my current cohort of multimedia students discovered their course of instruction through a variety of sources.

  • "I learned about Tipperary Institute from friend of mine who go here."
  • "I learned about TI from the prospectus."
  • "I saw it when scanning the CAO Handbook." Another student added, "It seemed like the thing to do."
  • "...in the local paper and I was impressed by the facilities."
  • "I can remember entering the contest to design the new logo for Tipperary Institute."
  • "...when reading The Clonmel Nationalist."
  • "I learned about it during a demonstration by a lady who was going from school to school."
  • "My friend had applied for the course."
  • "... in my second level school in Thurles..."
  • "I learned about it from the college website."
  • "I saw the signs."
  • "... when I was studying at the Central Technical Institute in Clonmel."
  • "I learned about the Tipperary Institute one night when working at Guidant." Another student also "heard about it from a friend" who had never taken any courses at Tipperary Institute.
  • "I went to the Careers Office and got the brochure from TI."
  • "TippFM had a catchy jingle about the Institute."

Question examined during the "Search Engine Strategies" module of the Writing Skills course.
x_ref119

January 5, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

January 03, 2004

The Google Factor

KILKENNY -- "Well, you could look that up on Google" said the voice in the High Street Mall--just like 60% of all the visitors to my blog properties do. More than 220 different visitors a day come to my blogs on the heels of search engine requests they made with Google. Most of them are attracted by titles of articles in my blogs. I cannot fathom why some blog authors turn off their titles. Failing to name a title reduces the rest of the content on Web pages to noise.


x_ref119

January 3, 2004 in Search Engines | Permalink | Comments (0) | TrackBack

December 23, 2003

Google Holidays

Google Winter Holidays 2003CLONMEL HOTSPOT -- During the Christmas 2003 holidays, Google is primping for an IPO, Andrew Orlowski is huffing and puffing to blow it down, John McCormac wonders if Google is "not using its Page Rank algorithms any more" and the Irish Open List offer measured comments concerning recent changes in the Google search engine rankings. The most recent Google Dance has helped the visibility of content I write for Irish Typepad and that translates nicely into small companies with focused product messages.

Google speculation is one way many people pass their time on the Web. Some observers have told the Irish Open Mailing List that Google is still using its underlying page rank algorithms. However, it has imposed some form of "over-optimisation penalty" on certain keywords. In my experience, watching the situation from the AdWords side of the house, this helps people who advertise with Google. It also helps bloggers who write clear content for indexing by Google.


Andrew Orlowski -- "A Quantum Thoery of Internet Value"
Matthew Ingram -- "Why Google Sucks"
Patrick Delaney -- "The consensus at the moment"
x_ref119

December 23, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

December 19, 2003

Google Adsense Expanding

GOOGLE -- Most people who visit this Irish Typepad blog in search of Google Adsense items come looking for things written about Google Adsense fraud. Personally, I think it would be difficult to defraud Google Adsense. I'm looking at the system from the point of view as a microcontent publisher and as an Adwords buyer. Google uses a host of clever mechanisms that will catch people trying to game the system. As discussions in several search engine forums reveal, even the most basic trick--manually clicking on "Ads from Google" at the bottom or sides of pages with clean cache--gets detected because Google listens to IP addresses and it seems to know browser identities.

Google AdSense has expnded language support for AdSense publishers. The ads won't change but the new facility gives dedicated email support and account pages in French, German, Italian, Japanese and Spanish.


x_ref119

December 19, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

December 06, 2003

Getting noticed

KILKENNY -- Every one of my 59 third level students shows up in the top three listings when searching for their name on Google. That feat can add credibility to a job application and put you on the radar scope of executive headhunters.


Fortune -- "Secrets of an Executive Recruiter"
Sent mail2blog using Nokia Communicator O2 Typepad service after breakfast in Kilkenny at Chez Pierre.
x_ref125ws

December 6, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

November 27, 2003

Referrer Intelligence

Google TurkeyCLONMEL HOTSPOT -- On Thanksgiving Day--with me unable to see the Google Thanksgiving page--this weblog now gets 10 percent of its search engine referrer traffic from Yahoo. These results mean that the Google share of traffic has declined at least 5% since August. It seems to reflect Yahoo's more aggressive crawling, portal position, and relevant search engine features.


Turkey extracted from Google's 1999 Thanksgiving Web page.
x_ref119

November 27, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

November 24, 2003

Look at me Google Hack

John Handelaar points out an interesting (I think it's creative expression not a perversion of ranking) Google Hack that involve the use of odd characters in the TITLE element that comes into view when searching for Netflix.

Use of arrows

As some may know, you can gain standing by using TITLE inside the BODY element, as people discovered on the publication of the Google PhD dissertation. Google still does not penalise that hack. Informed commentators think Google will strip the character tags from its listings, meaning the demise of arrowheads in the search results..


John Handelaar -- "Odd characters in Title attribute"
Jason Fried -- "Look at me search results"
x_ref119

November 24, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

November 15, 2003

Spiders in IRC

KILKENNY -- Google's spiders visit IRC channels. This may unsettle those slating Mike Fagan in Ireland, thinking their comments were ether-thin. Now you can imagine them sweating if Mike cops onto power-searching the Web archive.


Aaron Swartz -- "Google into IRC"
Sent mail2blog using TypePad Vodafone service from The Ground Floor, Kilkenny, Ireland.
x_ref119

November 15, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

November 02, 2003

Spidering Hacks

Spidering HacksBOING BOING -- The latest book in the O'Reilly Hacks series, "Spidering Hacks," (written by Kevin "Morbus Iff" Hemenway and Tara "ResearchBuzz" Calishain) is out. It's the site-scraper's bible, with 100 tips and tricks for sucking in data from the Web.

Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented--you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.


Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content.


Cory Doctorow -- Spidering Hacks
Published by O'Reilly ISBN 0596005776
x_ref125mw

November 2, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

October 31, 2003

Halloween Facelift

Google does HalloweenCLONMEL HOTSPOT -- I really enjoy how Google changes its face (its logo) for specific seasons and events. And creative writers like Karen Alexa have reskinned their blogs with an orange and black scary theme. The seasonal effects are good examples of intertextuality in Web space.

But the big Halloween news is this is the sixth anniversary of when I first took Ruth out on the town! And its the first occurrence of Google giving "multimedia degree Ireland" to an article written on Irish TypePad. We were just approved to run a European-accredited degree in computing and multimedia and we're on top of 39,200 other pages found in a Google search for the term. More importantly, we're #12 of 1.3m pages when searching for "multimedia degree." With this kind of Google juice, we'll go a long way.


Karen Alexa -- She gives "Silver beetle" a "Halloween Face lift"
x_ref1256 x_ref125ms

October 31, 2003 in Search Engines | Permalink | Comments (1) | TrackBack

October 30, 2003

Know them by their searching

CLONMEL HOTSPOT -- I glance at interesting referrer strings that occasionally find their way into my referrer logs. Some tidbits from my referrer logs:

  • Russians are looking for Sugarbabe texts and they found my item on Sugarbabe ringtones

  • Back yard engineering merits more coffee chat conversation from my colleagues than any other topic on Irish TypePad.
  • Last flight of Concorde draws more attention than anything I have ever blogged about. It draws a minimum of 80 people here every day, down from one every two minutes on Concorde's last mission.
  • Students, parents and guidance counselours searching for "Tipperary Institute" come here looking for information about the newly accredited multimedia degree programme.
  • The most obvious ego surfer visiting this blog looks for Tinderbox.
  • Based on initial information gleaned from referrer logs on three sites, USB phones or VoIP phones are selling well.

Gathered from TypePad, Atomz, Analog and Radio Userland.
x_ref119

October 30, 2003 in Search Engines | Permalink | Comments (1) | TrackBack

October 25, 2003

Google's IPO Auction

FT -- Even local radio knows Google--which means the brand is a service and its IPO will do the numbers. Teenagers in my housing estate ask each other "How do you know for sure? Did you Google that?" Librarians fret because students use Google as their first stop, not the card catalogue. I think I will set aside EUR 2000 for Google's massive online auction of shares next quarter. Investment bankers believe Google will take in more than $15bn. This is worth the deferral of Christmas presents.


Richard Waters -- "Google considers online IPO auction"
Joi Ito -- "Holy cow." with seven comments and counting, including insights about DPO.
x_ref119

October 25, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

October 04, 2003

Google Does ALT tags

Padraig Culbert and Seamus PuirseilCLONMEL HOTSPOT -- I knew that Google grabbed and indexed ALT tags on images but I didn't know the indexing was done within a week of the Crawler13 visiting. The fact came home to me because someone visited here to find Seamus Puirseil. The Google results page directed the visitor to my blog item on "History Flow." There's nothing in the history flow that relates to Puirseil. However, there is an alternate tag with his name in it from my Concord eyeQ photo album. And that's what Crawler13 went away in its bag. I need to implement a search feature on Irish Typepad soon and it has to be extensive enough to grab the photo album details.


x_ref119

October 4, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

October 03, 2003

Temperatures Rising Over Google AdSense

Google Green DotGOOGLE -- Count me as a satisfied customer of Google AdSense. It works just like Google said in its terms and conditions and its monthly cheques pay three weeks of medication that my mom needs for thinning out her varicose veins. I can understand some of the current discontent over Google red-flagging microcontent sites that have too many click-throughs from the same IP block because that's how a spider would crawl a site and click the links. But I can't understand how some AdSense users fail to read the terms and conditions of the programme before they sign up. As much as we would like transparency in all parts of the business world, clever crawlers can pervert click-through systems. I have seen it, lost money to it and can empathise with Google's technology team as they try to combat it.

That said, there are several clever ways of working around even the Google anti-fraud algo and even though the AdSense Terms and Conditions forbid talking about the work-arounds, they're openly discussed as effective techniques on two Search Engine Optimisation fora where I lurk. The guys who are whining the loudest would make better use of their time inside the quiet space used by professional content creatives. But if they did that, what would they blog about? Once you know how to reverse-engine the algos and program clever bots that emulate Googlebot, Scooter, and Slurper, you're not going to blab about it. The web intelligence professionals certainly don't let the world get inside their part of the Web spider kingdom.


Google AdSense
Comments to Russell Beattie's observations
Andrew Orlowski -- "Google shafts blogger"
Cory Doctorow -- "AdSense Terms of Service gag critics"
Aaron Swartz -- "Shut up and serve ads"
Joi Ito -- "New Google AdSense Terms of Service Suck"
x_ref119

October 3, 2003 in Search Engines | Permalink | Comments (2) | TrackBack

September 10, 2003

Five word Google identity

kILKENNY -- After a long reading of the comments section at Mitch Kapor's site, I realised that I can lay claim to writing the pages people find by using five-word search sequences. It happens every day, seen in snapshots of my referrer logs. Today I wonder if I should lock down some of the phrases, possibly by registering them as unique marks. After Rice Krispies with our Pomeranian (he laps milk like a cat), I concluded I have to

  1. Revise a few straplines.
  2. Monitor them as they position themselves in Google, AltaVista, FAST, MSN and DMOZ.
  3. Buy a complementary URL.
  4. Convince key sites to construct exact-phrase pointer links.
  5. Put the phrase on the business card as part of daily trading.
Within two months, I'll share my notes on the specifics of what I'm doing. It's part of the Public Relations course that I teach this term.
Thomas Phelps and Robert Wilensky -- "Robust hyperlinks cost just five words each"
IETF -- "Common Name Resolution Protocol"
See Doggie photos with Holly, the Samoyed-Spaniel animal.
Sent by mail2blog Nokia Communicator TypePad services in McDonagh Station, Kilkenny.
x_ref119 x_ref125pr x_ref26121

September 10, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

September 09, 2003

Google turned five

GoogleCLONMEL HOTSPOT -- Google turned five on Sunday but it feels like it has been around for an Internet generation. Last year, a pre-teen in my housing estate asked me, "Do you Google?" I told her that myGoogle is my tour guide and that I often start searches from a Google URL hosted in another country because that occasionally shows results that varies across Google's 80 languages, largely as results from monthly Google dances.


x_ref119

  • Google started with four employees, its search system handling little more than 10,000 queries per day. Now it handles more than 200 million and Google has become a phenomenon that has transcended its online origins.
  • I use Google for browser blogging, comparison shopping, news, and pop-up blocking.
  • People go to the end of their marketing budgets to ensure high placement in Google.

September 9, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

September 04, 2003

Searching Wayback

RESROURCE SHELF -- On a day that I have got sample code running for my personal search machine, the Wayback Machine now has "keyword search with recall." I have always wanted the ability to search way back with key words. Now a beta tool offers keyword searching of a portion (11bn pages) of the Wayback database. Anna Peterson from the Internet Archive is developing this technology.

Results pages offer several features including a graph of "returned pages by date" and the ability to narrow and focus year search using various categories and topics. You can also limit your search by the date the page was captured by the Wayback Machine spider.


Gary Price -- "The Internet Archive"
Search the Internet Archive
More about searching the Internet Archive
Anna Patterson -- "Cob Web Search" Powerpoint Presentation featuring deep trawls related to the Battle of the Boyne and King Billy.
x_ref125mw x_ref119

September 4, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

August 23, 2003

Sun up on enterprise search management

KILKENNY -- I watched the sun rise on Francis Tansey's larger-than-life artwork in the lobby of the Kilkenny Ormonde Hotel and thought about the people visiting here (my blog) with "enterprise service management" on their minds. When the first "service management" visitor landed and read material that I had written, I had to dig around to find what the we trying to find. In the case of a facilities manager, "services management" might include the identification of Francis Tansey's art works for a premises. In the case of a web manager, it normally means portal or web services.

In my Media Writing course, we teach a module that relates to this discipline, including coverage of

  • search engine marketing
  • PPC management
  • trusted direct feeds
  • link building
  • traffic intelligence
  • position reporting
Taken together, these skills provide companies with competitive advantage. We consider them part of the core skills of our Media Studies students after one year of academic and practical work.
Park PR in Dublin understands the concept of "accelerated search management" because they were the first Irish firm to come looking for it here.
Sent mail2blog by Nokia 9210i O2 Typepad services from the lobby of the Kilkenny Ormonde Hotel, Ireland.
x_ref119

August 23, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

August 20, 2003

Give them what they want

CLONMEL HOTSPOT -- The first fortnight of running my Irish Typepad site resulted in me giving people images in the form of photo albums because I received more direct e-mail bearing that request than any other. Right behind "show us your pics" requests came comments like "it's easier to work around this site." Sure it is. But I need a site search facility as well, because Atomz tells me people want to know they've seen all to be seen about topics and if I had a site search facility in place, it would reveal more about what people want. Last week, the most common on-site queries were for things related to communictions (spam and e-mail), ethics issues (think Dylan Creaven), innovative technology (intellisign, Prismiq and the Nokia 9210) and PayPal.


x_ref119

August 20, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

August 19, 2003

Google glimpses

DIDDLY -- The Random Personal Picture Finder uses Google Images to search for digital camera photos, providing an odd glimpse into the lives of everyday people.


Dave Mattson -- "Random Personal Picture Finder"
x_ref119

August 19, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

Crawler 13 does not visit here

CLONMEL HOTSPOT -- I wonder if I could entice Google's Crawler 13 to visit here? That crawler harvests Underway in Ireland and Topgold, sometimes visiting four times a day. I reckon it's a little early to expect Crawler 13 attention but would hope it starts showing up in mid-September.


Update: by 2 Sep 03 (one month after this URL went live) Crawler13 had started visiting Irish Typepad twice a day. That means my content hits the Google index within one week of its posting on the blog. Cool!
x_ref119

August 19, 2003 in Search Engines | Permalink | Comments (0) | TrackBack

August 13, 2003

More Google Resources

CLONMEL HOTSPOT -- One can never get too many Google resources. Some new ones I'm enjoying:


x_ref119

August 13, 2003 in Search Engines | Permalink | Comments (0) | TrackBack