Hide

Digital Resources in the Humanities Conference - Sept. 1997

hide
Hide

GENUKI - The UK genealogical information service

Phil Stringer
Manchester Computing, University of Manchester

E-mail: P.Stringer[at]mcc.ac[dot]uk

Introduction

GENUKI is a World Wide Web based information service that was set up to provide information related to the study of UK genealogy and Family History. The URL required to access the service is http://www.genuki.org.uk/. Development started in 1995 with a small group of people providing the pages that were primarily hosted by the MIDAS service at Manchester (Austen, et al., 1995). Since then, an ever increasing number of people have become involved in providing information which has resulted in a significant amount of information being held on separate machines, mainly in the UK, but with a common look and feel. It has now become the primary source of UK genealogical information on the WWW, receiving 1000 accesses daily from throughout the world.

The information collected and used by genealogists is of interest to a wider audience, as family historians are not just interested in individuals, but also in local history and the lifestyle of their ancestors. We will now describe the types of information available in GENUKI, how it is organised and how it planned to increase the information in both quantity and depth of coverage in the future.

The Information providers

The GENUKI information is provided by over fifty volunteers, the majority of whom are just private individuals sharing a common interest in family history. Many of the pages are held on university machines, but the service did not start out as being aimed at academics, but purely one aimed at those with an interest in local and family history. Since inception, it's close relation to the types of service being offered on the Midas (Manchester Information Datasets and Associated Services) service, and its popularity has led to it being adopted as part of the services offered by Midas. There are now over 7000 pages of GENUKI information with links to approximately 1250 pages of information provided by other information providers. The number of accesses to the home page of approximately 1000 per day is the best figure that can be obtained, to give an idea of how many individuals are using the information on a daily basis.

How the service is organised

When constructing a service that consists of a large number of pages, it is important to define at the outset how it is to be organised, so that inexperienced users can easily find the information that they want and to present a common "look and feel" to the service as a whole. This has been achieved by defining a set of standards that all information providers have agreed to follow. The structure that has been adopted is a geographical hierarchy, with subject headings based on those used by the Church of Jesus Christ of the Latter Day Saints for their Family History library catalogue. As they have the largest genealogical library in the world it was an appropriate one to use.

The geographic hierarchy consists, in the main, of four levels, the top level containing information relating to the whole of the UK and Ireland. Below this it is arranged according to country e.g. England, Scotland, Wales, then by county and at the lowest level according to town or parish. The standard set of subject headings used at each level are ones such as Census, Civil Registration, Church Records, History, and Probate. 

The standards adopted also cover presentation and layout with the aim being to provide information quickly and in a form that most WWW browsers can display. Users are sited throughout the world, and they do not all have fast network connections. The information is therefore presented mainly as text with just a few images. This provides the fastest delivery of information and avoids delays caused by the transmission of images which usually affect the appearance but not the content. This has actually resulted in a number of comments about presentation, as many WWW sites tend to concentrate on looks rather than content, and in comparison with these it is not as visually attractive. The majority of the GENUKI information providers think that the correct balance has been achieved between style and the speed of information delivery.

Information provided by GENUKI

The primary aim of GENUKI was to provide information about what records were available, usually on paper or microfilm, and where to find them, and to provide an overview of the contents of common types of records and how to use them. Since the start of the service further types of information such as record transcriptions have been offered by volunteers, and added to it. Many of the basic records used by Family Historians require a significant amount of work to use unless there is an index available. A large number of the indexes available, have been produced by volunteers at Family History societies, the best example of which is the 1881 census which has been transcribed and indexed for genealogical research. Now that the work has been done, it is now also a useful source of data for statistical analysis.

A number of indexes are now being made available via GENUKI, and also other transcribed material, such as the Northowram Register (Horsfall Turner 1881) and a number of other old books. Care has been taken to ensure that all such material is out of copyright and/or is provided with the permission of the authors/transcribers. Links are included at the appropriate points in the GENUKI pages to any other relevant source.

The number of record transcripts and indexes that are currently available on GENUKI is still quite small, as the compilers often use them as a source of income to offset the cost of their production, and don't as yet want to make them freely available. It is hoped that as time progresses other people will follow the example of existing contributors and increase the amount of data available. Some of the transcripts and indexes currently available include:

  • Parish register transcripts particularly for Durham, but also including some for Cumberland, Derbyshire, Hereford, Kent, Northumberland, Somerset , Suffolk and Yorkshire. 
  • A number of census transcripts for Gloucestershire and Warwickshire and some indexes for Suffolk. 
  • Surname indexes to trade directories for Devon, Gloucestershire, Yorkshire and Glamorgan. 
  • Listings of monumental inscriptions for Oxfordshire, Glamorgan and a number of other Welsh counties. 
  • An index to all the pre-1858 wills held at the Cheshire Record Office.
  • A complete transcription of the "St. Catherine's House" Civil Registration marriage index for the 1st quarter of 1849, and partial transcripts for 1856. 

There is no central coordination yet for the collection and making available of the material described above, as the whole project has been undertaken by individuals operating in their own free time. There have therefore been no checks made by GENUKI on the contents regarding their accuracy. Experience has shown that the majority of family historians are quite conscientious about accuracy and the need for checking is well known. It is also accepted practice to realize that however much checking is performed, occasionally errors do occur, and so whenever possible the original records are checked after finding an entry in a transcription or index. 

A number of record repositories, such as the county record offices, are now starting to have their own home pages on the WWW. These are linked into GENUKI at the appropriate points in the geographical hierarchy, and so it can be used to help find out where particular classes of records are held. For example the Public Record Office publish a series of leaflets about their records which can be collected on a visit to the PRO. The PRO have their own WWW site, but at the moment, contains just some the information in the leaflets. But GENUKI has copies of them all online and accessible via the WWW with the permission of the PRO. It is expected that in time the PRO can provide all such information themselves. 

The record transcripts and indexes are presented either as plain text, or html formatted files, and not as searchable databases. This has been because of the extra overhead required to provide such facilities, and differences in data formats and layout. Only one set of data has been made available as an online searchable database, which consists of the geographic location of approximately 15,000 churches in England, Scotland and Wales. Currently the majority of these are Church of England as existed in the early 19th century with an approximate National Grid reference. Work is being undertaken to obtain volunteers, with local knowledge, to expand this to include churches of all denominations up to the present day. Additional data being collected is denomination, an accurate grid reference, and founding and closing dates. The search facility allows users to specify a location and date, and it will report all that are found within a distance also specified by the user.

There is one type of data which we specifically do not include within GENUKI, which is personal family trees. Genealogists tend to be very keen to make this sort of information available to a wide audience, but it does take up a large amount of storage space and is of interest to just a very small audience who may have common ancestors. There are other sites which cater for this need so GENUKI does not contain such information, concentrating on providing information which may be of interest to a wider audience.

Indexing

The structure defined for the pages as a whole makes it possible to determine exactly where in the hierarchy to look for the information required. But many users do not bother finding out how the information is structured, and even when they get to the page which should contain the link, there is often so much information that it is often not easy to quickly find the information required.

So an alternative means has been provided for navigating through the pages. This is done by providing a hierarchy of contents pages which is maintained in parallel to the data pages. It was originally presented as a single page, but as the number of pages increased, download times increased and the management of them needed to be devolved to a number of people. Consideration was given to producing these contents pages by a piece of software, but the need to produce short but meaningful text against each link, and an easily readable layout, meant that a manual approach had to be used.

This does not mean that WWW search engines cannot provide another means of finding information, but the contents pages make it very easy to see what information is available for a particular geographic area.

Managing the information provided

The service now consists of a large number of pages of information held on quite a number of machines, the majority of which are in the UK. The number of pages inevitably leads to errors occurring in the html syntax which results in the appearance of failing links. There is also the problem of links to external information being lost when that information moves or is no longer available. There are tools available to help detect such problems when information is all held on the same machine, but there was nothing readily available to regularly validate an ever changing set of pages held on diverse machines. In order to remedy this a Perl program has been written which regularly examines all the GENUKI pages and reports any errors found so that they can be corrected. The task of coding the program was considerably eased by a decision made in the early life of GENUKI to include the character string "genuki" in all URLs so that programs could be written to search all the pages regardless of the server holding them.

The program performs a number of functions whilst browsing the tree of WWW pages, the main ones being:

  • Detect and report failed links within the GENUKI pages.
  • Detect and report failed links to pages provided by other services.
  • Collect statistics on the number of pages, and size.
  • Produce a list of all the pages found with last modified date, etc. to assist in producing contents pages, and a "What's new" page.
  • It is also intended to develop this program to periodically take a copy of all the pages which can be used for demonstrations in locations where the WWW is not available, and as a backup in case problems occur at an individual site.

Developing the service

There are now 40-50 people providing information for GENUKI and developing its pages. There are individuals maintaining most of the English counties, with rather fewer with direct responsibility for the Welsh and Scottish pages. Basic information is provided centrally for those counties without a maintainer who has specific local knowledge. There is a very close historical relationship between Irish records and those for the rest of the geographic area of the British Isles and GENUKI does link to a similar service providing information about Ireland. The structure they have adopted though is rather different and the data is currently not being developed. We have tried to encourage its developers to work with us to develop a common approach, but nothing has resulted from this.

We are having success in developing links with Family History Societies, who are the bodies with the specialised knowledge of their particular areas, and who have the resources to develop historical data in a form that can be processed by a computer. Most of the English and Welsh societies are members of the Federation of Family History Societies, and it's Computer Advisory Group liases with GENUKI helping provide information helping make the specialist information provided by the member societies known to a wider audience (Randell and Stringer 1997).

The Family History Societies have been transcribing records for many years now and publish it for members and to the wider public if they know that it is available. The records of some information such as monumental inscriptions from graveyards that have been closed is sometimes now only available from the societies. Other information has only come to light through their work. For example some sections of the 1851 Manchester census have become difficult to read over time due to water damage and these have never been filmed and published. As much information as possible from these records is now being transcribed by members of the Manchester & Lancashire Family History Society at the PRO and published on microfiche, but this is only available from the society.

Following on from the success of transcribing and publishing the whole of the 1881 census (of which there is a machine readable copy), the FFHS is undertaking other national projects to transcribe other relevant information. The main one being the National Death & Burial index covering records of deaths for the early part of the 19th century. One of the difficulties of running such a project which is collecting a large amount of data is in finding the resources to combine the individual parts into a single dataset.

This is an area where we think GENUKI can help. Sections of it run on large systems which primarily exist to make similar information available to the academic community. The genealogists are now quite used to their information also being made available on these systems and it should be possible to reach agreement with them whereby resources are made available for the large tasks of combining individual sections of the data and in return the information could be made available for academic research. The GENUKI team will be investigating ways in which this can be progressed.

References

Austen M., Dunstan V., Randell B., Stanier A., Stringer P., Woodgate J. (1995) An Information Service for United Kingdom & Ireland Genealogy based on the Internet's World Wide Web. Computers in Genealogy.

Heywood O. , Dickenson T., Edited by Horsfall Turner, J. (1881) <The Nonconformist Register.

Randell B. , Stringer P. (1997) GENUKI - the Internet-Based UK & Ireland Genealogical Information Service. Family History News & Digest Vol. 11, No.1