Citations and the GEDCOM

By | January 16, 2013

The #1 complaint I get about Evidentia genealogy Software is “I already entered my data into my other genealogy software, why do I have to enter it again“.

It’s a valid complaint – no one likes data entry. Entering citations (and that’s the work we are really talking about) is the least fun part of the documenting process.

So why doesn’t Evidentia just let you upload a GEDCOM file from other genealogy software and eliminate the extra data entry?

The short answer is, we are working on it. A GEDCOM Import and Export will be included in a 1.3 release this spring. The upgrade will be free to all current users. But the results will not be completely satisfactory.

In this article, I want to take a look at how 4 popular genealogy software programs handle exporting citations to a GEDCOM file. We will look at Roots Magic 6, Family Tree Maker 2012, Legacy 7.5, and PAF 5. What we will see is that the GEDCOM Standard is not really standard anymore, as software companies have tried to address modern citation expectations with a file transfer protocol from the 1990’s.

First let’s take a look at the GEDCOM.

GEDCOM 5.5

The current GEDCOM standard, version 5.5, was published in 1996. It was developed by the LDS “to provide a flexible, uniform format for exchanging computerized genealogical data”1. There was an update (5.5.1) in 1999, which has become a de facto standard, but this 5.5.1 has not been adopted by all vendors.

Citations in the GED are supported by the SOURCE_CITATION record, identified with the SOUR tag. (We will just look at the relevant tags.)

SOUR – a reference pointer to SOURCE_RECORD provided later in the file (e.g. “3 SOUR @S001@” )
PAGE – a 248 character field indicating where in the source to find a reference
NOTE – a multi line freeform text field

The SOURCE_RECORD referenced by the SOURCE_CITATION includes the following tags:

SOUR – the source record identifier (e.g. “0 @S001@ SOUR” )
AUTH – a multi line text field indicating the person or agency who created the data (like a book author or editor)
ABBR – a 248 character field providing a short title used for sorting and retrieval of this record
TITL – a multi line text field indicating the title of the work
PUBL – a multi line text field indicating when and where the source was created
TEXT – a multi line text field

Hopefully you can see that there is no concept of a citation, just a collection of fields that can be used in lue of a citation to “find” the original source!

How I Tested

For the purposes of this article, I started with an empty database in each program and created an entry for my grandfather, Raymond Brooks Thompson. I created two facts based on an entry in the 1930 U.S. Census, one for Residence and one for Occupation.1930CensusSnapshot-RaymondBrooksThompson

ancestryCitationSnapshot

I started with a “reference” citation using the citation methodology referenced in Elizabeth Shown Mills’ “Evidence Explained” 2.

Source List Entry

Virginia. Warwick County. 1930 U.S. Census, population schedule. Digital Images. Ancestry.com. https://www.ancestry.com : 2013.

Full Reference Note

1930 U.S. Census, Warwick County, Virginia, population schedule, Newport News, Enumeration District 111-1, p. 2B (written), dwelling 28, family 36, Raymond B. and Ellen C. Thompson; digital image, Ancestry.com (https://www.ancestry.com : accessed 5 Jan 2013); citing NARA microfilm publication T626, roll 2469.

I cited the source separately for each event. For the Residence fact I created a “free form” citation using the “reference” citation. For the Occupation fact I cited the same source, but used each programs “template” system for creating the citation. This will allow us to see differences in how the GEDCOM data is represented. I then exported a GEDCOM from each program, the results of which you will see below.

I will confess up front I am not an expert in every program listed. I have tried my best to massage the citation entry in each program so that a report created by that program would show a citation as close to the reference entry as possible. This is NOT an evaluation of the various programs, just an attempt to reveal the challenges faced by software vendors in sharing citation data using the GEDCOM.

Roots Magic 6

Roots Magic provides excellent functionality for creating free form or templated citations. The images below show the entries created within the program.
RM- cite-freeformRM- cite-template

Notice that the previews show the expected results when compared the the “reference” citation.  Reports generated within Roots Magic give us exactly what we want for both citations…
RM-Footnotes…and Bibliography
RM-Bibliography

Family Tree Maker 2012

It took me a bit to get figure out how to enter a free form citation in FTM – it kept trying to get me to use it’s template system.
FTM-freeform

 

The template was pretty straightforward.program template

I was able to get a decent citation listing…

inprog-citation

…but a true bibliographic entry was lacking
ftm-biblio

PAF 5

PAF seems to use a GEDCOM format to store its data. There was not a concept of a template, so free form was the only option. You are basically building a GEDCOM.  So I tried to enter the data in a way that seemed to make sense…
2013-01-15 18_21_34-Edit Source
..except PAF truncated my data – it didn’t seem to support the multi line option for long data strings, truncating the data to 248 characters.

Eventually I had to drop PAF from the test, since I couldn’t enter a compliant citation. Since “what you see is what you get” with the GEDCOM we can be pretty sure of what the GED output would be anyway.

Legacy 7.5

Free form citations were  straight forward, though I had to make a judgement call about where the bibliography and first reference “belonged”. (Remember, the GEDCOM really doesn’t support those concepts directly.)

freeform

 

The template system made a lot of  dated assumptions, however.legacy-cite-templateThe resulting first reference for free form was a mixed bag but the template results were compliant with current expectations.
citationsThe reverse was true of the bibliography, where the free form listing was spot on, but the templated listing was verbose.

bibliography

GEDCOM Differences

So how did each vendor use the GEDCOM fields?

Roots Magic 6

Because I input the free form citation with nothing specific to the event (page information), the SOURCE_CITATION was simple:

1 RESI
2 DATE 3 APR 1930
2 PLAC Newport News, Warwick, Virginia, United States
2 SOUR @S1@

When we look at the SOURCE_RECORD we find it is pretty straightforward too:

0 @S1@ SOUR
1 ABBR 1930 Census – RBT – Freeform
1 TITL 1930 U.S. Census, Warwick County, Virginia, population schedule, Newpor
2 CONC t News, Enumeration District 111-1, p. 2B (written), dwelling 28, famil
2 CONC y 36, Raymond B. and Ellen C. Thompson; digital image, Ancestry.com (ht
2 CONC tp://www.ancestry.com : accessed 5 Jan 2013); citing NARA microfilm pub
2 CONC lication T626, roll 2469.

In this sample, the TITL tag holds the entire first reference, as input.
Roots Magic also included RM specific tags.

1 _BIBL Virginia. Warwick County. 1930 U.S. Census, population schedule. Digita
2 CONC l Images. Ancestry.com. https://www.ancestry.com : 2013.
1 _TMPLT
2 TID 0
2 FIELD
3 NAME Footnote
3 VALUE 1930 U.S. Census, Warwick County, Virginia, population schedule, Newpor
4 CONC t News, Enumeration District 111-1, p. 2B (written), dwelling 28, famil
4 CONC y 36, Raymond B. and Ellen C. Thompson; digital image, Ancestry.com (ht
4 CONC tp://www.ancestry.com : accessed 5 Jan 2013); citing NARA microfilm pub
4 CONC lication T626, roll 2469.
2 FIELD
3 NAME ShortFootnote
2 FIELD
3 NAME Bibliography
3 VALUE Virginia. Warwick County. 1930 U.S. Census, population schedule. Digita
4 CONC l Images. Ancestry.com. https://www.ancestry.com : 2013.

_BIBL is handy since the GEDCOM really doesn’t spec out a bibliographic tag.
Under the _TMPLT tag we also find values for a complete footnote and again, the bibliography.

So what about the templated entry?

1 OCCU Bookkeeper
2 DATE 3 APR 1930
2 PLAC Newport News, Warwick, Virginia, United States
2 SOUR @S2@
3 PAGE Newport News; 111-1; p. 2B (written); dwelling 28, family 36; Raymond B. and Ellen C. Thompson; accessed; 15 January 2013

In this case we have a PAGE tag that includes part of our citation. There are also some RM specific fields that capture the form fields used by the Roots magic program, and their data:

3 _TMPLT
4 FIELD
5 NAME CivilDivision
5 VALUE Newport News
4 FIELD
5 NAME ED
5 VALUE 111-1
4 FIELD
5 NAME PageID
5 VALUE p. 2B (written)
4 FIELD
5 NAME HouseholdID
5 VALUE dwelling 28, family 36
4 FIELD
5 NAME Person
5 VALUE Raymond B. and Ellen C. Thompson
4 FIELD
5 NAME AccessType
5 VALUE accessed
4 FIELD
5 NAME AccessDate
5 VALUE 15 January 2013

You could presumably use this information in conjunction with the SOURCE_RECORD to reconstruct the citation IF you know the pattern

0 @S2@ SOUR
1 ABBR 1930 Census – RBT – template
1 TITL 1930 U.S. Census, Warwick County, Virginia, population schedule, , ; di
2 CONC gital images, Ancestry.com (https://www.ancestry.com : accessed ); citin
2 CONC g NARA microfilm publication T626, roll 2469.
1 _SUBQ 1930 U.S. Census, Warwick County, Virginia, population schedule, , , .
1 _BIBL Virginia. Warwick County. 1930 U.S. Census, population schedule. Digita
2 CONC l images. Ancestry.com. https://www.ancestry.com : .
1 _TMPLT
2 TID 43
2 FIELD
3 NAME Country
2 FIELD
3 NAME CensusID
3 VALUE 1930 U.S. Census
2 FIELD
3 NAME Jurisdiction
3 VALUE Warwick County, Virginia
2 FIELD
3 NAME Schedule
3 VALUE population schedule
2 FIELD
3 NAME ItemType
3 VALUE digital images
2 FIELD
3 NAME WebSite
3 VALUE Ancestry.com
2 FIELD
3 NAME URL
3 VALUE https://www.ancestry.com
2 FIELD
3 NAME CreditLine
3 VALUE citing NARA microfilm publication T626, roll 2469

Again, _BIBL tag has our bibliography. However the citation in the TITL field is missing the PAGE data. Appropriate, but the point here is that you have to know the pattern to reconstruct the citation.

Family Tree Maker 2012

I had to break up the free form data to get a proper citation generated in a FTM report. I show the SOURCE_CITATION and SOURCE_RECORD together to show that for free form, the basic formula is TITL + PAGE = your original citation.

1 RESI
2 DATE 03 APR 1930
2 PLAC Newport News, Independent Cities, Virginia, USA
2 SOUR @S4@
3 PAGE Warwick County, Virginia, population schedule, Newport News,
4 CONC Enumeration District 111-1, p. 2B (written), dwelling 28, family 36,
4 CONC Raymond B. and Ellen C. Thompson; digital image, Ancestry.com
4 CONC (https://www.ancestry.com : accessed 5 Jan 2013); citing NARA microfilm
4 CONC publication T626, roll 2469.

0 @S4@ SOUR
1 TITL 1930 U.S. Census

There really is no way to extract a bibliographic entry from the GED.

The templated citation is similar:

1 OCCU Bookkeeper
2 DATE 03 APR 1930
2 PLAC Newport News, Independent Cities, Virginia, USA
2 SOUR @S3@
3 PAGE population schedule, Newport News, Enumeration District 111-1, p. 2B
4 CONC (written), dwelling 28, family 36, Raymond B. and Ellen C. Thompson;
4 CONC digital image, Ancestry.com (https://www.ancestry.com : accessed 5 Jan
4 CONC 2013); citing

0 @S3@ SOUR
1 TITL 1930 U.S. Census, Virginia, Warwick County, Population Schedule, NARA
2 CONC Microfilm Publication T626
1 NOTE
2 CONC 1930 U.S. census, population schedule. Virginia. Warwick County.
2 CONC NARA microfilm publication T626, roll 2469. Washington, D.C.:
2 CONC National Archives and Records Administration, n.d. Digital images.
2 CONC Ancestry.com (https://www.ancestry.com).

Again, TITL + PAGE seems to give us our citation, AND we have a bibliographic listing added as a NOTE. However the bibliographic entry isn’t quite up to spec, reversing the jurisdiction and the title.

Legacy 7.5

Free form citations in Legacy are driven mostly by how the user choses to input the data, massaging it to get the desired results.

1 RESI
2 DATE 3 Apr 1930
2 PLAC Newport News, Warwick County, Virginia
2 SOUR @S2@

0 @S2@ SOUR
1 MEDI Census/Tax
1 ABBR 1930 U.S. Census
1 TITL Virginia. Warwick County. 1930 U.S. Census, population sche
2 CONC dule. Digital Images. Ancestry.com. https://www.ancestry.co
2 CONC m : 2013.
1 PUBL 1930 U.S. Census, Warwick County, Virginia, population sche
2 CONC dule, Newport News, Enumeration District 111-1, p. 2B (writ
2 CONC ten), dwelling 28, family 36, Raymond B. and Ellen C. Thomp
2 CONC son; digital image, Ancestry.com (https://www.ancestry.co
2 CONC m : accessed 5 Jan 2013); citing NARA microfilm publicatio
2 CONC n T626, roll 2469.

In this case there is no PAGE tag, the bibliographic entry is in the TITLE tag, and the citation is in the PUBL tag. This was very dependent on how I as the user chose to enter the data.

The templated entry is different…

1 OCCU BookKeeper
2 DATE 3 Apr 1930
2 PLAC Newport News, Warwick County, Virginia
2 SOUR @S4@
3 PAGE T626, roll 2649, Newport News, enumeration district (ED) 11
4 CONC 1-1, p. 2B, dwelling 28, family 36, Raymond B. and Ellen C
4 CONC . Thompson, accessed 15 Jan 2013

0 @S4@ SOUR
1 ABBR 1930 U.S. census
1 TITL 1930 U.S. census, Ancestry.com, Digital images
1 AUTH Virginia, Warwick County
1 PUBL https://www.ancestry.com: National Archives and Records Admi
2 CONC nistration, 2013

Now the citation data is split between the PAGE, TITL, AUTH and PUBL tags, and there is no sense of a bibliographic entry. Insider knowledge is required to know how to recreate the original first reference.

Summary

Because the GEDCOM standard does NOT address modern citation expectations, vendors have had to decide how best to export citation data from their programs. This introduces several variables into the process, including:

  • How a vendor interprets the standard tags
  • How a vendor chooses to use custom tags
  • If the user uses templates or free form citations
  • Variations in how a user decides to enter the free form or templated citations.

Some programs are not able to support the current expectations in citation data. PAF seems to be an example of this.Those programs that can are challenged when importing GEDCOM data created from a different system – the data may or may not be available to created what we would consider a valid citation. This means that in data coming from a program that supports the expected citation formats, citations may be mixed in their compliance if they have been imported from other systems.

Now add to that mix a program like Evidentia, which is trying to enforce a certain level of compliance to current citation expectations, and you can easily see how allowing importation of data using the GEDCOM could corrupt the quality of citation data.

So yes, in the near future Evidentia will support some level of GEDCOM import (and export) – but there will still be effort required on the part of the user to protect the integrity of their citations.

Is it worth it?

1 The GEDCOM Standard Release 5.5. Family and Church History Department, The Church of Jesus Christ of Latter-day Saints (1996), p. 3.
2 Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, 2nd ed.
(Baltimore: Genealogical Publishing Co., 2009), p. 240.