README for CheckPedigree.py
Information about the program:
CheckPedigree.py will perform a series of checks on your pedigree data to insure that the data is all right. If no errors are detected then a zip file will be created. The zip file represents your checked pedigree file to upload to the Interbull Centre IDEA database. For technical reason the program rejects files containing more than a million of data.
The checks relate to:
- Check the international identification numbers (animal, sire and dam)
- Correct three digit country code as in the ISO 3166 standard (no missing countries allowed)
- Correct three digit breed code according to the Interbull breed codes
- Correct construction of the alpha-numerical part of the ID (registration numbers, right justified, leading blanks as zeros, all types of characters allowed except ; and ~ )
- Missing sires and dams shall be coded as UUUUUUUUUUUUUUUUUUU (i.e. with 19 U)
- Check the animal's birth date
- Has to be reported in the format YYYYMMDD
- If you know only the year of birth then enter it as YYYY0000
- If you know year and month of birth then enter them as YYYYMM00
- Missing birth dates are coded as 00000000 (or blanks or a single 0)
- Check that a male (or female) animal will eventually appear only as sire (or dam)
- Check for inconsistent duplicate records (different sire, dam or birthdate)
- Check that an animal is always younger than its parents and grandparents
Before Running the Program:
Install Python (Python2 version 2.5 to 2.7), or Python3 (at least version 3.6, preferable as Python2 is no longer supported by developers)
- Create a working directory/folder
Download the CheckPedigree.py program from https://idea.interbull.org/software and copy it to your new directory
- Copy your pedigree file to the working directory
Running the Program:
- Ensure there is a working network connection
Use the command: python CheckPedigree.py -m <ORGCODE> -f <filename>
Use your uppercase ORGCODE as shown on the upper right hand side of the IDEA page.Your organization code is reported within brackets beside the "Logged in as" information.
- The program checks its internal version with the value stored on the Interbull server. You will have to download the most recent version if there is a mismatch.
If you want to inspect a pedigree file containing more than a million record you can do so by adding a "-t" at the end of the command line. By doing so, your pedigree will be only tested for its correctedness but no output will be created. The command line to use will be therefore: python CheckPedigree.py -m <ORGCODE> -f <filename> -t
After Running the Program:
If no errors are detected, the pedigree file will be written into a zip file called IB-ORGCODE-yyyymmddThhmmss.zip. Upload the zip file to Interbull's data exchange site: https://idea.interbull.org/.
In case of errors, no zip file will be created. Please correct your data and re-run the program until the data successfully pass all required checks.
Specific information about your pedigree data, descriptive statistics and a summary of errors are written to the file CheckPedigreeLog.txt.
All errors are listed in detail in the file called CheckPedigreeErrors.txt. The following table describes the brief error messages more fully:
Error message |
Description |
Inconsistent duplicates |
An animal appears twice with different sire, dam or birth date |
Warning duplicates |
An animal appears twice but with same sire, dam and birth date |
Illegal character errors |
The numerical part of the international ID is not valid |
Breed-country error |
The breed-country combination is not recognized |
Sex coding error |
The sex code is neither M nor F |
Parent sex error |
A male animal (or a female) appears in the dam (or sire) column |
Birth date errors |
Malformed entry for birth date |
Ancestor check |
Animal appears older than its parents or grandparents |
Pedigree loops |
Animal appears further back in its pedigree tree as an ancestor of itself |
Too many animals detected |
For technical reason the number of pedigree lines that can be submitted in each file has been limited to 1 million. If your file exceeds such limit you need to split its content into two (or more) files and test each of them with the CheckPedigree.py program |
Note
Please do not modify the program to circumvent any checks. Doing so would be pointless because the same checking routine is used again inside IDEA to double-check the pedigree file uploaded in the zip file.
If you need assistance, please do not hesitate to contact us at interbull@slu.se .