So, this weekend I decided to give the subscription-based Ancestry.com a more thorough workout. I knew quite a bit about the service from my interview with its CEO earlier this year, and was particularly impressed by the amount of federal census data that the company and its partners have scanned and indexed. It includes all of the results from 1790 to 1930, which amounts to hundreds of millions of names. The site was offering a two-week trial for free, so I went for it.
The amount of information available through the service really is incredible. In the course of about six hours of online research, I was able to find census returns for two or three branches of my tree, obtained the names of about ten siblings of direct ancestors, and verified the states and countries of birth for many others. I even found the WWI draft card for my paternal great-grandfather, a person who had previously been just a name with some rough biographical information. The digital image of the card on ancestry.com included his exact date of birth and his occupation in 1918 -- a rail road freight handler.
But Ancestry.com has limitations, too, some relating to the tool itself, and others relating to nature of the records.
I'll start with the search interface. It is not for the faint of heart; a search for a single name can generate hundreds or even thousands of results. The options for refining them do not include an easy way to exclude certain types of records, such as federal censuses from after a certain date. Fortunately, the interface includes a useful "save to shoebox" function which allows users to quickly save a record to a front-page location for more thorough investigation later, which is useful for plowing through multiple pages of search results without getting hung up on transcribing or saving image files. You can also search individual data sets, such as the census from a certain year through the "card catalog" section of the website.
One other interface issue involves uploading GEDCOM files (family tree data files that use the LDS Church's industry standard data interchange format). You have to start a tree before you can upload a GEDCOM, which is frustrating, but I was a little disturbed by the way ancestry.com treats the file. Once you upload a GEDCOM file, the information on it is made public by default. This was not made clear during the upload process unless you click off to the EULA; I only found out after I checked through the "Manage my tree" link.
In addition, there's no easy way of telling what sorts of notes are included in the uploaded GEDCOM data that's made public -- is it just name, date of birth, place of birth, and family connections? Or does it include all of the personal notes that I added in Reunion for Mac, including sensitive pieces of data such as cause of death?
I made my tree "private," but I am thinking of pulling it down. This is not just because of privacy concerns but also because I would rather maintain the "master" tree with attached photos and audio files on my home computer.
The records themselves have some flaws as well. For two branches of my family, I have encountered instances of surnames being incorrectly transcribed and indexed. It's east to spot, if you look, because the indexed text doesn't match up with the scanned image of the original government document. However, it's a significant problem in that it has a major impact on how search results are presented, and also how users rank them in terms of usefulness -- if I'm looking for "Walsh," I am more likely to skip over the results that are listed as "Welsh" or "Walch" because I can't be bothered to check the original. Ancestry.com is sensitive to this; there is a link next to all records in the database which allow users to notify the company of typos and other mistakes.
Still, I found the error rate is high. Part of this relates to the fact that Ancestry.com outsources its data-entry tasks to Chinese companies, but I believe another issue is American handwriting standards -- I have observed that the quality of the cursive used by census enumerators and clerks seemed to decline after 1900. Add to the mix damaged paper records (such as water stains in the attached image, below) and darkened and blurred scans, and it's no wonder that the transcriptions have a high error rate.
The original records sometimes have other errors. I've noticed that the census records often contradict the information that I've obtained via interviews and other documents. Sometimes, census returns even contradict each other. One example I saw tonight involved the 1870 and 1880 federal census returns for my maternal great-grandmother. In the earlier version, all of her siblings are listed as being born in Rhode Island. In the 1880 scan, the older children are listed as being born in New Jersey.
There are many reasons why this could have happened -- a simple miscommunication, a deliberate attempt to mislead the enumerator, a lazy or harried enumerator, or something else. I'll never find out why one of them was wrong, but it forces me to find another record to determine which one is right. It also raises the specter of many errors never even being detected, because there is no second source to check them against.
And this brings me to my final point: Ancestry.com and other online sources bring a lot of new information to the genealogical treasure hunt, but there are many holes in the online records. Using ancestry.com's supposedly complete database of federal census returns from 1790 to 1930, I have been unable to find any records of certain people who I know existed. Others show up for a specific census, but cannot be found in any other census or online document. An example is my great grandfather, the rail road freight handler -- I found his military record, and he showed up for the 1920 census along with his family, but there is no record of him in the 1910 or 1930 censuses, even though I know he was alive and living in the same state. I have a possible hit on the 1900 census, but it's also possible it was someone else with the same name and roughly the same age living in the same area. Most of the federal 1890 census was destroyed in a fire, and the online data for the 1880 census doesn't have any results either. So, while Ancestry.com was the source for his date of birth and occupation, I still don't know the names of his father or mother -- only that they were born in the same state (a fact which I gleaned from the 1920 census results).
This points to a very basic fact about genealogical research in the 21st century -- despite the availability of wonderful online research tools and mechanisms to share findings with others, some of the most useful primary sources and sources of data remain in paper format ... or in people's heads. This is the way it's been for centuries, since people started doing genealogical research. I've done a lot of work in terms of interviewing relatives and gathering documents from family members, and I've gotten a lot of new information from the World Wide Web, but it's still not enough. One of these days I am going to have to make a trip out to my great-grandfather's hometown and spend a few hours in the county clerk's office, tracking down the all-important vital records that can really fill in the blanks on that branch of my family tree. It may seem like an old-fashioned way of conducting research, but it will probably bring some of the most rewarding results.
Updates:
- Google/Ancestry.com followup: Using outsourced Chinese labor to overcome OCR limits
- Why Ancestry.com is not enough
Image: An online scan of a page from the United States census of 1880. It's legible, but the original document had some water damage.

