Sunday, November 30, 2008

Ancestry.com review

(See update at bottom) For the past three months, I've been using physical records (paper documents and audio files containing interviews with relatives), Reunion for Macand some free online genealogy services to build out my family tree. I did pretty well with what's available, and managed to create a tree with more than 200 names and lots of very specific information relating to dates and places of birth. However, I eventually reached a point where I could go no further.

So, this weekend I decided to give the subscription-based Ancestry.com a more thorough workout. I knew quite a bit about the service from my interview with its CEO earlier this year, and was particularly impressed by the amount of federal census data that the company and its partners have scanned and indexed. It includes all of the results from 1790 to 1930, which amounts to hundreds of millions of names. The site was offering a two-week trial for free, so I went for it.

The amount of information available through the service really is incredible. In the course of about six hours of online research, I was able to find census returns for two or three branches of my tree, obtained the names of about ten siblings of direct ancestors, and verified the states and countries of birth for many others. I even found the WWI draft card for my paternal great-grandfather, a person who had previously been just a name with some rough biographical information. The digital image of the card on ancestry.com included his exact date of birth and his occupation in 1918 -- a rail road freight handler.

But Ancestry.com has limitations, too, some relating to the tool itself, and others relating to nature of the records.

I'll start with the search interface. It is not for the faint of heart; a search for a single name can generate hundreds or even thousands of results. The options for refining them do not include an easy way to exclude certain types of records, such as federal censuses from after a certain date. Fortunately, the interface includes a useful "save to shoebox" function which allows users to quickly save a record to a front-page location for more thorough investigation later, which is useful for plowing through multiple pages of search results without getting hung up on transcribing or saving image files. You can also search individual data sets, such as the census from a certain year through the "card catalog" section of the website.

One other interface issue involves uploading GEDCOM files (family tree data files that use the LDS Church's industry standard data interchange format). You have to start a tree before you can upload a GEDCOM, which is frustrating, but I was a little disturbed by the way ancestry.com treats the file. Once you upload a GEDCOM file, the information on it is made public by default. This was not made clear during the upload process unless you click off to the EULA; I only found out after I checked through the "Manage my tree" link.

In addition, there's no easy way of telling what sorts of notes are included in the uploaded GEDCOM data that's made public -- is it just name, date of birth, place of birth, and family connections? Or does it include all of the personal notes that I added in Reunion for Mac, including sensitive pieces of data such as cause of death?

I made my tree "private," but I am thinking of pulling it down. This is not just because of privacy concerns but also because I would rather maintain the "master" tree with attached photos and audio files on my home computer.

The records themselves have some flaws as well. For two branches of my family, I have encountered instances of surnames being incorrectly transcribed and indexed. It's east to spot, if you look, because the indexed text doesn't match up with the scanned image of the original government document. However, it's a significant problem in that it has a major impact on how search results are presented, and also how users rank them in terms of usefulness -- if I'm looking for "Walsh," I am more likely to skip over the results that are listed as "Welsh" or "Walch" because I can't be bothered to check the original. Ancestry.com is sensitive to this; there is a link next to all records in the database which allow users to notify the company of typos and other mistakes.

Still, I found the error rate is high. Part of this relates to the fact that Ancestry.com outsources its data-entry tasks to Chinese companies, but I believe another issue is American handwriting standards -- I have observed that the quality of the cursive used by census enumerators and clerks seemed to decline after 1900. Add to the mix damaged paper records (such as water stains in the attached image, below) and darkened and blurred scans, and it's no wonder that the transcriptions have a high error rate.

The original records sometimes have other errors. I've noticed that the census records often contradict the information that I've obtained via interviews and other documents. Sometimes, census returns even contradict each other. One example I saw tonight involved the 1870 and 1880 federal census returns for my maternal great-grandmother. In the earlier version, all of her siblings are listed as being born in Rhode Island. In the 1880 scan, the older children are listed as being born in New Jersey.

There are many reasons why this could have happened -- a simple miscommunication, a deliberate attempt to mislead the enumerator, a lazy or harried enumerator, or something else. I'll never find out why one of them was wrong, but it forces me to find another record to determine which one is right. It also raises the specter of many errors never even being detected, because there is no second source to check them against.

And this brings me to my final point: Ancestry.com and other online sources bring a lot of new information to the genealogical treasure hunt, but there are many holes in the online records. Using ancestry.com's supposedly complete database of federal census returns from 1790 to 1930, I have been unable to find any records of certain people who I know existed. Others show up for a specific census, but cannot be found in any other census or online document. An example is my great grandfather, the rail road freight handler -- I found his military record, and he showed up for the 1920 census along with his family, but there is no record of him in the 1910 or 1930 censuses, even though I know he was alive and living in the same state. I have a possible hit on the 1900 census, but it's also possible it was someone else with the same name and roughly the same age living in the same area. Most of the federal 1890 census was destroyed in a fire, and the online data for the 1880 census doesn't have any results either. So, while Ancestry.com was the source for his date of birth and occupation, I still don't know the names of his father or mother -- only that they were born in the same state (a fact which I gleaned from the 1920 census results).

This points to a very basic fact about genealogical research in the 21st century -- despite the availability of wonderful online research tools and mechanisms to share findings with others, some of the most useful primary sources and sources of data remain in paper format ... or in people's heads. This is the way it's been for centuries, since people started doing genealogical research. I've done a lot of work in terms of interviewing relatives and gathering documents from family members, and I've gotten a lot of new information from the World Wide Web, but it's still not enough. One of these days I am going to have to make a trip out to my great-grandfather's hometown and spend a few hours in the county clerk's office, tracking down the all-important vital records that can really fill in the blanks on that branch of my family tree. It may seem like an old-fashioned way of conducting research, but it will probably bring some of the most rewarding results.

Updates:


Image: An online scan of a page from the United States census of 1880. It's legible, but the original document had some water damage.



Friday, November 28, 2008

Summer remembered ...

I'm going through my 2008 photos, and stumbled upon a collection of nature photos I made around our yard and on Prospect Hill in Waltham in July and August, using my S7000 and a tripod. It seems so long ago ...

Rotted stump:

rotted stump

Mossy log:

Mossy log

Rusted telecommunications tower:

Rusted telecommunications tower

Flower, fence, and tower:

Flower, fence, and tower

A small purple flower:

purple flower

Pansy:

Pansy

Blue bouquet:

Blue bouquet

Creepers:

creepers

Droplets:

droplets

I am licensing these photos under Creative Commons 3.0, which basically means you are free to copy them, place them on your own website, use them for commercial purposes, and adapt them, as long as you attribute them to Ian Lamont and link back to this post on ilamont.com or ilamont.blogspot.com.

Wednesday, November 26, 2008

Holiday gadget guide

Over at the Standard, we just wrapped up our first holiday gift guide. It's all about gadgets, electronics, and computers, but we themed it for shopping in a recession -- no Macbooks, iPhones, or other high-end items, but lots of other great stuff that won't break the bank:

"Holiday gift guide: 10 winning gifts in a down economy"

Chris Tompkins was the main author, but I also wrote and researched part of this, and I have to say that the cost of some consumer electronics is getting unbelievably cheap. A full-function portable media player (including video) for only $100? Chris even found a very powerful (dual core, 4 GB RAM, decent video card) HP laptop for about $550.

We're in the market for an HDTV, and while the model I profiled in the gift guide is too small (we're looking for a 47" screen) I was very reassured by the fact that prices have dropped a lot since the last time I looked in the summer.

Friday, November 21, 2008

The interview with Linden Lab's Ginsu Yoon

Last night at 11 pm, I finally published the epic interview with Ginsu Yoon, which I conducted last week when I visited Linden Lab HQ. You can read it here:

Interview with Linden Lab's Ginsu Yoon

I say "epic" because it was (as noted by Virtual Worlds News) a "long, ranging interview" that took 40 minutes to complete and many hours to transcribe. Gene (which is how he introduced himself) had a lot to say about the company's enterprise strategy in Second Life, the financial health of the company, and the Immersionist vs. Augmentationist perspectives of virtual world development. I also saw a demo of an in-world tool for company meetings, which I am going to try out as a user next week.

And, I found out that the chances of using Second Life in a Web browser or on a netbook are unlikely. I had to ask ...

Thursday, November 20, 2008

Source Blocks get a boost ...

Jason Preston over at Eat Sleep Publish interviewed me about my "Source Blocks" idea. He and a few other people have expressed their support for this experiment in journalistic transparency, and now I am wondering how to improve and expand the system. Some questions to consider:

Are source blocks useful?

How can they be improved?

How can we evangelize Source Blocks to the journalism community?

Source blocks are short paragraphs at the end of an article or blog post that summarize the sources of information that were cited or consulted. Here's the format:

Sources cited, referenced, or consulted: Thestandard.com, Eatsleeppublish.com, an email from Jason Preston.

Monday, November 17, 2008

Obama's CTO

We did a fun little thing over on The Industry Standard -- a slideshow featuring the top 10 rumored contenders for Obama's CTO.

There are a few very famous names, but the dark horses interest me the most. Who would be your pick?

Sunday, November 16, 2008

Ubuntu installation problems

My Ubuntu Linux project has not been going well. I spent about five or six hours on it yesterday (while simultaneously doing other things) but I was not able to get it up and running on my ThinkPad i Series type 2611-472.

The first problem was I used the standard graphical installer for Ubuntu 8.10 (Intrepid Ibex). I was able to get the welcome screen and run some of the basic utilities (check disc, check memory) but when I attempted to run Linux from the boot CD or install it, it froze about 5 minutes into the process with a blank screen.

I then figured out that the problem was insufficient memory -- the machine I am using was state-of-the-art in early 1999, but 64 MB RAM doesn't even come close to meeting the minimum requirements for Ubuntu.

I then tried the alternate install CD, which can be used for low-memory systems. I also defragged the C and D drives on the target machine and ran a disk check using the Windows 98 utilities. The alternate Ubuntu boot disc definitely went more smoothly, and was even to (apparently) set up a working network connection using the automatic configuration utility. But then it got hung up on the "Starting up the partitioner" step. I tried it three times, including one time waiting about 45 minutes, but it seemed frozen at the 50% mark. On the Ubuntu help forums, I found someone with a similar problem, and the apparent solution is this:
Manual partitioning isn't that hard to do,but it can be a little intimidating the first time you use this.
If your hdd is empty and there is no need for a dual boot hdd,try the next steps.
When the partitioner starts up,choose manual,create a partition 10GB and set it to format as ext3,turn bootflag on and mount it as /
Create a partition 2GB,set it to format as swap,and mount it as swap.
Create a partition as big as you want [/home] set it to format as ext3,mount it as /home.
If there's space left on your hdd you can create another partition for data or what ever,but this can be done later if you like.
If your done creating partitions,scroll down, and choose to write the changes to the hdd.
Continue the install.
"Intimidating"? How about terrifying? I don't want to screw up the machine, so I really hope this works. I didn't even realize there was an option to select "manual" when the partitioner starts up, but I'll apparently need to try that.

Or try a diffent distro. I've also downloaded Mandriva and Kubuntu, maybe I'll give each a spin before playing doctor on my old ThinkPad.

Saturday, November 15, 2008

Linden Lab HQ

Earlier this week I visited San Francisco to meet with colleagues and also to conduct some Second Life-related interviews at Linden Lab HQ. The building is located on Battery St., in a quiet neighborhood about a block or two away from the Embarcadero. The outside of the building was surprisingly old and low-key -- I was expecting something more modern, kind of like one of the corporate sims you sometimes see in SL. The inside was contemporary, though. I saw that whiteboard that I've heard other people talk about, but I didn't recognize any of the names.

I'll write up the interviews next week on thestandard.com, but in the meantime, here are the photos:

Linden Lab HQ

Linden Lab HQ



I am licensing these photos under Creative Commons 3.0, which basically means you are free to copy them, place them on your own website, use them for commercial purposes, and adapt them, as long as you attribute them to Ian Lamont and link back to this post on ilamont.com.

Source blocks: My new journalism experiment

I've started a new practice for almost every single piece of content that I write for the Industry Standard -- adding a paragraph at the end of each article explaining who I talked with or where I got information in the process of conducting research. It's not the same as academic endnotes, but it gives readers a much more complete idea of the sources that went into each article or essay. It looks like this:
Sources cited, referenced, or consulted: Blog.basturea.com, American Journalism Review (ajr.org), Editorsweblog.org, Glasshouse.waggeneredstrom.com, Techmeme.com, thelongtail.com, Washingtonpost.com, Wired
The basic reason for doing this is to add transparency, but there are other factors that come into play as well -- I fully describe the reasoning behind my new journalism experiment over on the Standard. I've never seen this done before in any newspaper or magazine before, and one of my colleagues says "people don't care," but it's not hard and it increases transparency.

What do you think?

Friday, November 14, 2008

Burleson Consulting's dress code for Oracle DBAs

Spotted this on an old Joel on Software thread: the dress code for Burleson Consulting:
If you have never worked in a professional environment and you are not sure how professionals look, watch the lawyers on an episode of Law & Order on television.
That's just the beginning. Be sure to follow the link the dress code and read the whole thing -- nose hair and all.

Saturday, November 08, 2008

Linux for beginners

My daughter has been bugging me for her own computer for a few months. I finally decided to rehabilitate an old ThinkPad i Series that's been in my basement for her use.

Unfortunately, it's running Windows 98 and the PCMCIA Ethernet card won't recognize the connection from our broadband router. When I bought the ThinkPad in 1999 for $3000 (!) dial-up was the norm, and I never had it hooked up to a broadband connection in the four or five years I used it. I added the Ethernet jack in order to network it to my iMac and function as the iMac's print server (using an old printer that only had serial ports), but now I need it to work as a standalone, Internet terminal. Windows 98 TCP/IP troubleshooting is a mess. I could try getting a wireless card, but instead I am going to take a stab at installing Linux. This may not solve the Ethernet connection problem, but it's worth a shot, and will give me a chance to see Linux up close.

One minor obstacle: Online Linux resources are not optimized for beginners. Try searching for "Linux for beginners" and you'll see what I mean. Many of the top resources are many years old, and/or are actually meant for non-beginners -- people who are comfortable with partitioning hard drives and working from a command line. I can deal with this, but many true beginners will give up when they start to look at these sites. It's a PR issue that the Linux community should really address, if they want to operating system to gain more traction.

Anyway, I am leaning toward Ubuntu, which some former colleagues have recommended. I don't have a blank CD at home, so I'll have to get one and give it a go. I'll let readers know how it goes ...

Tuesday, November 04, 2008

Wordle text visualization: Cool, but not killer


A Wordle tag cloud, based on this blog as of Nov. 4, 2008. Common verbs, articles, and other words are stripped out, and the most frequently mentioned word gets the largest font. I wrote about Apple mice earlier this week, and the word "Mouse" is mentioned on the home page eight times.

You can randomize the appearance to try different fonts and colors, which is fun ... for about two minutes. Like many data visualizations, it's really cool but hardly a killer app that you return to every day.

Images are licensed under a Creative Commons 3.0 license.

Monday, November 03, 2008

Comparing the Tab's online and print advertising

This morning, I wrote an essay about newspapers' struggle to attract local advertisers to online websites.

One example I used was the Tab. I counted 10 online ads (in three sizes) on the front page. This compares with hundreds of ads in the print edition (including auto dealers and real estate).

That's a disparity that the Tab, the News-Tribune, the Boston Globe and the Boston Herald will have to solve in coming years if they want to support their current editorial staffs.

Bad UI: The Apple mouse

Apple is known for products that not only have beautiful designs, but also have innovative and easy-to-use UIs.

So why do Apple mice suck so bad?

My mom and dad have a Mac Mini with an early generation "Mighty Mouse" that looks like the one in the inset photo. My mom has been complaining about emails and news articles mysteriously opening up in dozens of individual tabs without warning. On my last visit to their house, I discovered why: She was inadvertently pressing the little scroll button on the top of the mouse. It's not hard to do, considering downward pressure is required to click the entire mouse and activate a link. But by doing this and simultaneously hitting the little button, she also caused all unread emails to open up in tabs (when using Yahoo Mail) or news articles in an RSS feed in Firefox to open up in tabs.

It's not the only poorly designed mouse. The mouse I got with my 2003-era iMac didn't have any right-click functionality, unless you simultaneously hit ctrl. And then there were the dumb "hockey puck" mice from the late 1990s that were hard to hold and click.

It's gotten so bad, that I actually use a Microsoft mouse for my iMac. Heresy? No, just practicality.

Sorry, Apple. Pretty design /= good UI.

Sunday, November 02, 2008

Is Santa Real?

A scene in the car this evening:

Daughter: Is Santa real?

A long pause.

Me: Yes

Daughter: Then why do we never see him?

I thought I had a few years before I started hearing questions like this.