Skip to Content

README for CheckPedigree.py

Information about the program:

CheckPedigree.py will perform a series of checks on your pedigree data to insure that the data is all right. If no errors are detected then a zip file will be created. The zip file represents your checked pedigree file to upload to the Interbull Centre IDEA database. For technical reason the program rejects files containing more than a million of data.

The checks relate to:

  • Check the international identification numbers (animal, sire and dam)
    • Correct three digit country code as in the ISO 3166 standard (no missing countries allowed)
    • Correct three digit breed code according to the Interbull breed codes
    • Correct construction of the alpha-numerical part of the ID (registration numbers, right justified, leading blanks as zeros, all types of characters allowed except ; and ~ )
    • Missing sires and dams shall be coded as UUUUUUUUUUUUUUUUUUU (i.e. with 19 U)
  • Check the animal's birth date
    • Has to be reported in the format YYYYMMDD
    • If you know only the year of birth then enter it as YYYY0000
    • If you know year and month of birth then enter them as YYYYMM00
    • Missing birth dates are coded as 00000000 (or blanks or a single 0)
  • Check that a male (or female) animal will eventually appear only as sire (or dam)
  • Check for inconsistent duplicate records (different sire, dam or birthdate)
  • Check that an animal is always younger than its parents and grandparents

Before Running the Program:

  1. Install Python (Python2 version 2.5 to 2.7), or Python3 (at least version 3.6, preferable as Python2 is no longer supported by developers)

  2. Create a working directory/folder
  3. Download the CheckPedigree.py program from https://idea.interbull.org/software and copy it to your new directory

  4. Copy your pedigree file to the working directory

Running the Program:

  • Ensure there is a working network connection
  • Use the command: python CheckPedigree.py -m <ORGCODE> -f <filename>

  • Use your uppercase ORGCODE as shown on the upper right hand side of the IDEA page.Your organization code is reported within brackets beside the " Logged in as " information.

  • The program checks its internal version with the value stored on the Interbull server. You will have to download the most recent version if there is a mismatch.
  • If you want to inspect a pedigree file containing more than a million record you can do so by adding a "-t" at the end of the command line. By doing so, your pedigree will be only tested for its correctedness but no output will be created. The command line to use will be therefore: python CheckPedigree.py -m <ORGCODE> -f <filename> -t

After Running the Program:

If no errors are detected, the pedigree file will be written into a zip file called IB-ORGCODE-yyyymmddThhmmss.zip . Upload the zip file to Interbull's data exchange site: https://idea.interbull.org/ .

In case of errors, no zip file will be created. Please correct your data and re-run the program until the data successfully pass all required checks.

Specific information about your pedigree data, descriptive statistics and a summary of errors are written to the file CheckPedigreeLog.txt .

All errors are listed in detail in the file called CheckPedigreeErrors.txt . The following table describes the brief error messages more fully:

Error message

Description

Inconsistent duplicates

An animal appears twice with different sire, dam or birth date

Warning duplicates

An animal appears twice but with same sire, dam and birth date

Illegal character errors

The numerical part of the international ID is not valid

Breed-country error

The breed-country combination is not recognized
- see file CheckPedigreeAuth .txt (created by the program)

Sex coding error

The sex code is neither M nor F

Parent sex error

A male animal (or a female) appears in the dam (or sire) column

Birth date errors

Malformed entry for birth date

Ancestor check

Animal appears older than its parents or grandparents
- if a parent's birth date is unknown, grandparents are checked

Pedigree loops

Animal appears further back in its pedigree tree as an ancestor of itself

Too many animals detected

For technical reason the number of pedigree lines that can be submitted in each file has been limited to 1 million. If your file exceeds such limit you need to split its content into two (or more) files and test each of them with the CheckPedigree.py program

Note

Please do not modify the program to circumvent any checks. Doing so would be pointless because the same checking routine is used again inside IDEA to double-check the pedigree file uploaded in the zip file.


If you need assistance, please do not hesitate to contact us at interbull@slu.se .

public/checkpedigree_python_instructions (last edited 2020-08-03 09:00:06 by Valentina )

Open page in Interbull Centre wiki