Scaling the Fence: Achilles’ Thesaurus

Posted in Geek Stuff, Random thoughts by dave on August 31, 2009 4 Comments

Scaling the Fence is a series of posts on daverea.com exploring people’s aspirations, apprehensions and encounters with switching from proprietary to open-source software. This post is the first in the series.

Most geeks will tell you that they’re the go-to guy (or gal) when it comes to computer questions from friends and family – and if my experience is any indication, I’m no exception. On a recent car trip with my wife and one of our friends, the topic came ’round to computers, and how this particular friend was in the market for a new one. Of course, me being me, I had to get a plug in for Linux and open source software.

In this case, our brief discussion centered around office suites, and before I could even recommend it, our friend informed me that she’d tried openoffice.org, and didn’t like it. As someone who writes for a living, she needs a robust thesaurus – and her experience with the one built into OpenOffice.org (circa 2006) didn’t meet her needs. Unsure of what version she used, and clueless on where OpenOffice.org’s thesaurus is today, I couldn’t offer much in the way of advocacy outside of the possibility that someone may have written a plug-in to improve the thesaurus.

After we returned home, I decided to put the thesaurus to the test. The candidates? Microsoft Office 2003 and 2007 (tested on PCs at work – during my lunchbreak of course!), Google’s top result for “thesaurus”, Thesuaurs.com, and of course my copy of OpenOffice.org 2.4.1 (as packaged with Kubuntu 8.10). For good measure, I also threw in results from Princeton’s WordNet project, on which OpenOffice.org’s thesaurus has reportedly been based since version 2. Sadly, I no longer own a paper thesaurus, so unless someone would like to add some datapoints in the comments, I can’t include synonym counts for the dead-tree option…

As a language enthusiast and aspiring (albeit admittedly and unapologetically amateur) writer myself, I tried to choose words that I felt would have enough synonyms for a valid comparison. Granted, this is subject to the limitations and biases of my vocabulary, but I think I came up with a reasonable list:

  • Noun: Boss
  • Verb: Work
  • Adjective: Simple
  • Adverb: Extremely
  • Preposition: Beneath

From there, it was just a matter of punching everything into each of our candidates’ respective thesauri and tallying up the results:

Thesaurus Comparison Results (click to enlarge)

Thesaurus Comparison Results (click to enlarge)

As you can see (and also quite understandably) the online thesaurus goes home with the trophy, easily trouncing its nearest competitor by almost 5x (and quite creatively, in many instances, however questionable the usefulness of the results may be). The MS Office suites produced an average of 5.6 and 5.8 synonyms-per-word, respectively, and OpenOffice.org produced a healthy average of 6.6, beating both editions of Office and, interestingly, the WordNet database upon which its thesaurus is based! Of course, looking closer reveals that MS Office trumps OpenOffice on adverbs and prepositions, while OpenOffice.org noses ahead on verbs.

What does all this tell us? For starters, we can probably conclude that while OpenOffice.org’s thesaurus keeps pace with that offered by Microsoft Office 2003/2007, the relative usefulness of each will ultimately hang on what words (and types of words) a given user chooses to look up. This, in turn, will be determined by that writer’s style, vocabulary and preferences. It’s also pretty clear that going online (when there’s an option to do so, which is not always the case) will net the widest selection of superior synonyms for the scrupulous scribe.

Language tools like the thesaurus present an opportunity for the open source community. Just as our friend was quickly dissuaded from using OpenOffice.org because she perceived the thesaurus to be inferior, she might have been quickly won-over by a toolset that performed head-and-shoulders above those she was used to. Between WordNet, the OpenRogets project, the Big Huge Thesaurus, the New York Times’ thesaurus and the Moby Project (hey, it’s only the largest thesaurus in the English language!), we have the opportunity to package an offline thesaurus (or offer an optional download supplement, if binary size is a concern) for OpenOffice.org that could run circles around proprietary offerings.

Of course, the thesaurus is only one tiny facet of one program, which is itself only one facet of a larger suite of tools, which is itself only a minute fraction of the open-source world. It’s easy to discount as unworthy-of-effort in the face of the many other challenges that FOSS faces in achieving widespread adoption. If market share is any indication, OpenOffice.org’s thesaurus isn’t keeping it out of the hands of millions of users worldwide. That said, this strikes me as one small instance where we’ve found the enemy asleep at the gate – so why not take the opportunity to capitalize on it?

Irony … or outright Hypocricy?

Posted in Random thoughts by dave on August 11, 2009 No Comments yet

During a recent IceRocket search for news related to the up-and-coming Google Wave, I happened across a very interesting blog post from John Obeto. His claim is nothing new; essentially, that he’s concerned about potential privacy implications of the cloud-hosted Google Docs productivity suite.

I certainly don’t take issue with Mr. Obeto’s view there. Cloud apps, hosted storage, SaaS and other products that necessitate user trust will always have a potential to compromise privacy. Obeto isn’t the first blogger to contrast Google’s corporate motto (“Don’t be evil”) with their products and policies. And I wouldn’t be the first person to defend them, claiming that their contributions to technology and information systems go a long way toward enhancing our society, despite their relative few privacy concerns.

What I do find honestly laughable, though, is the context in which John chooses to decry Google’s supposed invasion into every corner of his computing life… a web site called AbsoluteVista.com! Looking at his other posts, it certainly appears that Mr. Obeto falls well within the Windows fanboy camp … so I think I’m safe in assuming he doesn’t see the humor of this situation. But seriously? You’re complaining about privacy issues … on a web site dedicated to an operating system that sets the standard for invading the privacy of its users, on top of carrying the most draconian EULA in the industry.

Readers can insert their epithet of choice here – perhaps “People in glass houses shouldn’t throw stones” or “What’s good for the goose is good for the gander” – but the point is, Obeto’s post is tantamount to bad-mouthing Coke for putting corn syrup in their cola, while sitting back and enjoying an ice-cold Pepsi…