The Human Fertility Database
Frequently Asked Questions
Registration, access, passwords
- How do I comment or contribute to the Human Fertility Database or to this FAQ section?
- Why do I have to register?
- How can I delete my name from your registration database?
- Do I have to pay to access the data?
- I registered but never received a password.
- My password does not work!
- I can not remember my password.
- How can I change my password to something other than random number?
- Can I let someone else use my password?
- What are the rules concerning the copying, republication, and use of the HFD data?
- How can I copy HFD data into an Excel spreadsheet?
- How can I copy HFD data to a statistical package?
- Data downloaded into Excel appear as text, not as numbers. What should I do?
- Data downloaded into Excel are all clustered in one cell for each line. What should I do?
- How do I reformat the data from "long" to "wide" format?
- I want to replicate my research. How can I get previous versions of country-specific datasets?
- How can I get HFD data for many countries and years in one file?
- I suspect there are errors in the HFD data. Where should I report them?
Understanding and interpreting the data
- Why are birth counts shown in decimal places when in reality births can only be counted in whole numbers?
- What do the pictograms (such as small squares, parallelograms, and triangles) mean?
- Why do the HFD indicators occasionally differ from the statistics reported by national statistical agencies or other statistical publications?
- Why does the cross-national (Excel) table featuring the completed cohort fertility and childlessness for all the HFD countries provide more recent time series of this indicator than the country-specific files?
- Why are the data on cohort parity distributions and completed fertility by birth order unstable for some countries?
- What is the difference between cohort and period fertility indicators?
- What is the difference between PATFR and TFR?
- Why are ages and cohorts duplicated in data files providing data by year, age, and cohort?
- Why are period data and fertility rates in the HFD shown for two different dimensions (year + age and year + birth cohort)?
- What age format is preferable for period fertility analysis, and what age format is preferable for cohort fertility analysis?
- Why does the HFD additionally present some summary indicators (i.e. TFR) by age 40?
- Does the age format of the period data (year + age vs. year + birth cohort) affect the computation of the mean age at childbearing?
- How can I use a cohort fertility table?
- How can I use a period fertility table?
- What is the benefit of a census-based fertility table?
- Is migration taken into account in the calculation of the population exposures?
- What is the difference between birth order and parity?
- What is a "golden" census?
- What is the difference between age-specific fertility rates and conditional fertility rates?
Do you have...?
Registration, access, passwords
We are always happy to hear from users. Contact us at email@example.com.
We require you to register before accessing the data in order to obtain your basic contact information (i.e., your name and e-mail), which we may occasionally use to contact you with important messages about the database (including updates and other important information). We will NOT give your e-mail address to anyone else for any reason, and we will not ask for any personal information other than the items listed above.
Please send a delete request to firstname.lastname@example.org. It is important that the request be sent from the e-mail address that was registered. If that is not possible (e.g., canceled account), then the delete request should include your name and the e-mail address used to register. We need this information in order to verify the user address to be deleted.
No, but you must register before you can access the data files.
You must provide a valid and complete e-mail address in order to obtain a password. If we determine that the provided e-mail address is not valid (i.e., the e-mail we send to you is rejected), you will be dropped from the user registry and you will have to start over by registering as a "new user." After registering, you should receive an acknowledgement via e-mail within a few minutes. If no message arrives, then the registration was not successful.
Please make sure that you are using the correct user ID; your "user ID" is your COMPLETE e-mail address. Also, make sure you enter the user ID using lowercase characters (even if you originally typed in your e-mail address using upper-case characters). In addition, your password is case sensitive (e.g., if you changed your password to "AbCd," then entering "abcd" or "ABCD" will not work). Make sure you type in the password EXACTLY as you entered it (Have you accidentally put the CAPS lock on?). If all else fails, you can reset your password (see the answer to the next questions).
Go to Login on the Main Menu, where you will have the option of resetting your password. You will be asked to enter your e-mail address (i.e., the one that you registered with originally), and the password will be sent to you via e-mail. Note that if you have not already registered with a valid e-mail address, you must start over by registering as a "new user."
Go to Change Password on the Main Menu where you will have the option of changing your password. You will be asked to enter your e-mail address and your current password, and then you can change your password to whatever you would like.
No. Each HFD user must register separately. If there is some important change to the database, we need a complete list of HFD users so that we can contact everyone who may be using the data.
You can use HFD data without any limitation for your work, professional, and other activities; and you may keep a copy of the data on your personal or work computer and data drives.
Any publication, published data output, table, graph, figure, or other illustration based on or derived from the HFD data should acknowledge the HFD as a source of the published data. The preferred reference to HFD is as follows:
Human Fertility Database. Max Planck Institute for Demographic Research (Germany) and Vienna Institute of Demography (Austria). Available at www.humanfertility.org (data downloaded on [date]).
If you would like to republish parts of HFD data, you should first obtain written permission from us. Please contact Sigrid Gellers-Barkmann.
Downloading Text Data Files
The easiest way to download the text (ASCII) data files is:
- Right click on the link to the data file you want,
- Left click on Save Target As (Internet Explorer) or Save Link Target As (Netscape, Mozilla),
- Specify the directory (folder) where you want to save the text file, and
- Left click on Save.
You may now open the file in a text editor (e.g., Notepad, Wordpad, Emacs), import the data into Excel (see instructions below), or read the data into a statistical package (e.g., SAS, Stata, SPSS, S-plus).
Opening Text Data Files in Excel
To open the data file in Excel:
- Left click File (Excel 2003) or Office button (Excel 2007) on the menu bar,
- Left click Open,
- Specify files of type text file (*.prn, *.txt, *.csv ),
- Left click on the data file to select it, and
- Left click on Open.
- The Text Import Wizard should automatically appear. Step 1: Choose "delimited" as the file type if it doesn't appear by default. Step 2: Choose the appropriate delimiter (i.e., "comma" if you are importing an HFD input file, or "tab" and "space" if it is an HFD output file). Step 3: You may change the data format for each column or use the default, and left click on Finish to import the data.
Note: These instructions are for Excel 2003; they may differ for earlier or later versions of Excel (please consult your manual or online help menu under "import data").
Please consult your statistical package manual or online help menu for instructions on importing data in a tab-delimited ASCII (text) format. Some suggestions are provided below.
Importing Data into STATA
For example, for importing into STATA the HFD file with the US age-specific birth counts named "USAbirths.txt" you may first delete in the file the first three header lines, and also delete in it the "+" and "-" signs using a text editor (MS Notepad). Then the file can be imported into STATA by executing the following command line:
infile Year Age Total using C:\USAbirthsRR.txt, clear
Importing Data into R
For R, read.table may be used with the options skip = 2, na.strings = ".", header = TRUE for all HFD output tables,
e.g. b <- read.table("USAbirthsRR.txt", skip = 2, na.strings = ".", header = TRUE)
You should be aware that when ages include suffixes (e.g., 12-, 55+) then the entire column will by default be converted to a factor. To prevent this from occurring, leave this column as character data and add the option as.is = TRUE to the read.table command. In either case, a command such as the following will drop the suffixes and convert the column to numeric values:
transform(b, Age = as.numeric(sub("[+-]", "", as.character(Age))))
Importing Data into Matlab
For importing HFD data into Matlab, the following functions can be used:
text=fileread(fname) returns the contents of the file as a string vector.
t=strrep(text,'+',' ') removes pluses.
t=strrep(t,'-',' ') removes minuses.
t=strrep(t,' .','NaN') recodes missing values.
fid=fopen('temp','w') opens a temporary file named "temp".
fprintf(fid,'%s',t) writes data with the above changes to the "temp" file.
fclose(fid) closes the "temp" file.
a=importdata('temp',' ',3) reads the data into Matlab.
To convert the data that appear as text to numbers in Excel:
- Select (by left clicking on) the first column of the spreadsheet into which the data were downloaded,
- Left click Data on the menu bar, and
- Left click Text to Columns.
- The Convert Text to Columns Wizard should appear. Step 1: Choose "delimited" as the file type. Step 2: Choose the appropriate delimiter (i.e., "comma" if you are importing an HFD input file or "tab" and "space" if it is an HFD output file). Step 3: You may change the data format for each column or use the default, and left click on Finish to convert the data.
Note: These instructions are for Excel 2003; they may differ for earlier versions of Excel.
Please see the answer to the previous question.
STATA contains the command reshape, which allows you to convert data from wide to long form, and vice versa. For more details about this command, please consult the STATA manual or the online help menu.
Let us consider an example of data reshaping. After importing the file with the US age-specific birth counts (HFD file USAbirthsRR.txt) into STATA (see the FAQ "How can I copy HFD data to a statistical package?"), the corresponding STATA data set has three columns Year Age Total in a long format. Column Year contains calendar years of observation varying from 1933 to 2006, column Age contains ages varying from 12 to 55, and column Total contains birth counts as real values. If you want to obtain the same data in a wide format with birth counts for the years 1933 to 2006 running from the left to the right as columns Total1933, Total1934, ..., Total2005, Total2006, execute the command line
reshape wide Total, i(Age) j(Year)
There is also a reshape function in R, which is included in the package stats. Entering ?reshape at the R prompt will bring up the corresponding online documentation. The same US birth count data in the data frame b is in the long format with one year and one age per raw. The following reshapes to one age per raw, with births by year at that age appearing in columns:
reshape(b, timevar = "Year", idvar = "Age", direction = "wide")
For how the data can be reformatted from "long" to "wide" format in Excel, please consult the MPIDR Technical Report "Reshaping of Human Fertility Database data from long to wide format in Excel" by V. Shkolnikov and D. Jdanov. The report and an accompanying VBA/Excel program can also be accessed through "Technical Reports" on Main Menu of the HFD website.
Each country-specific dataset has a unique and permanent URL provided at the bottom of the country page. The link remains unchanged and effective even after the dataset has been replaced with an updated version. The users are encouraged to store it for later reference to the dataset they have downloaded. Note that pooled data files are not versioned; this function will be implemented in the HFD in the future.
To facilitate rapid downloads of large amounts of data, the Human Fertility Database offers two series of zipped files. For users who only want information of a given data type (e.g., birth counts) for all the countries and years available, the zipped files "By data type" are recommended. Please go to Zipped Data Files on the HFD Main Menu.
Please report any suspected errors in an e-mail to email@example.com. Any comments, remarks, or suggestions would be greatly appreciated by the HFD team.
Understanding and interpreting the data
The detailed data provided on the HFD country pages are in many cases obtained from much less detailed raw data. Before being displayed on the HFD country pages, the raw data are adjusted and are additionally split into finer cells by methods specified in the Methods protocol. By displaying the estimated birth counts with a precision of two decimal places, the HFD is alerting users that these are somewhat artificial data. In all cases, the user must take responsibility for understanding the sources and limitations of all data provided in the HFD, which are documented in detail in the country "Background and Documentation" files, and in specific explanatory notes in the input data files.
There are four conventional data configurations as they appear on the Lexis diagram: square (or rectangle), vertical parallelogram, horizontal parallelogram, and triangle. Square or rectangle classifies births by calendar year and mother's age in completed years (ACY). In vertical parallelograms, births are defined by calendar year and the mother's year of birth (cohort), or by age reached during the year (ARDY). Horizontal parallelograms group together births by the mother's age in completed years (ACY), and her year of birth (birth cohort). In Lexis triangles, births are classified by all three possible dimensions: calendar year, the mother's age, and the year of birth (birth cohort). The pictograms on the HFD country pages show the Lexis shape by which the presented data are classified. For more details about the Lexis diagram and data configurations, see the Methods protocol.
The occasional differences between fertility indicators reported by the official statistical agencies and those provided in the HFD may arise due to differences in the underlying data or in methodology. In the former case, female population data as used in the HFD (and mostly taken from the Human Mortality Database), may differ from data reported by the official statistical bodies, as the HFD and HMD often provide additional population adjustments, such as retrospective intercensal adjustments for unreported migration. With respect to the methodology used, minor differences in the estimated indicators may be attributable to different definitions (e.g., the period Total Fertility Rates in the HFD are computed from age-specific fertility rates across the widest possible range of childbearing ages, -12 to 55+ whereas some statistical agencies apply a more restricted age range of 15 to 49, or even 15 to 44), and to additional data manipulations in the HFD. These may include data smoothing and splitting which occured when the initial input data were obtained for five-year age groups, as well as redistributions of births for which the age of the mother is unknown across the whole range of childbearing ages.
Why does the cross-national (Excel) table featuring the completed cohort fertility and childlessness for all the HFD countries provide more recent time series of this indicator than the country-specific files?
The country-specific data tables display two indicators of completed fertility. One shows the fully completed CCF at age 50 or older, and the other gives a "snapshot" of fertility at reaching age 40 (CCF40). The cross-national summary tables were designed with the aim of providing more up-to-date information on cohort fertility. Therefore the CCF is shown already at age 44 and not age 50. Taking into account that very few births occur after age 44, it has very limited impact on the resulting indicators, while the time series is extended by another six cohorts. (Based on Eurostat period data, in 2014 fertility at ages 44-50 amounted to 0.46% of the EU TFR, i.e. 0.006 in absolute terms.) The "ReadMe" sheet includes a comment "Completed cohort fertility is calculated for the highest age for which the indicator is available, but not lower than 44."
Whereas the census data show that changes in the parity distribution are usually smooth and relatively small from one cohort to another, the HFD data show considerable instability for some countries, especially for the cohorts born before 1950. This instability is mostly attributable to a substantial sensitivity of parity distribution estimates to data quality. As the HFD cohort fertility indicators are mostly obtained by cumulating cohort fertility experienced over long periods of time, even a very minor error, when repeatedly cumulated, may produce a distortion in the estimated parity distribution, especially in the data on childlessness.
Such small errors may have multiple causes, including data estimations and manipulations applied in the HFD (e.g., splitting of five-year age group data into single years of age, redistributing births with unknown birth order, or splitting birth data originally provided by single years of age into "Lexis triangles" (distinguishing both age and birth cohort dimensions)). Distortions in the estimated parity distribution may also arise from errors in the original data on births and female population (e.g., when the birth order reporting is of substandard quality, as in the case of Spain in some periods), or from a huge difference in the size of two neighboring cohorts. Whenever possible, the HFD tries to eliminate errors due to the rapidly changing size of neighboring birth cohorts by adjusting the female population data, using the original statistics on live births by month of birth. In addition, when the HFD data checking procedures detect considerable instability in the estimated parity distributions for some cohorts and countries, the users are warned of this in the country documentation files, or even on the country data page.
Cohort fertility data depict the fertility behavior of women born in the same period of time, usually in the same calendar year. In contrast, period fertility data reflect fertility rates across all cohorts of childbearing ages during a given period of time, usually one calendar year.
Cohort fertility data are "real" in the sense that they reflect the past fertility experiences of women born in, for example, 1960. Their main disadvantage is that they cannot be reliably derived before these cohorts of women approach the end of their reproductive period, i.e., until they are past age 40. Moreover, cohort measures reflect the "real" experience of a cohort in a closed population. When populations are subject to in- and out- migration, the cohort measures reflect the experience of a woman who would have been exposed to the age-specific rates observed on average during the relevant period.
Period fertility data are based on a concept of hypothetical or synthetic cohorts. They are hypothetical indicators based on an assumption that fertility rates observed in a given period remain indefinitely "frozen" across the whole set of childbearing ages (for a more detailed description, see Preston et al., 2000).
Both measures are summary indicators of period fertility, and lend themselves to the same interpretation. They represent an estimate of the mean number of births per woman over her entire reproductive life. There is, however, an important difference between these two measures. While the TFR is based on simple summation of unconditional age-specific fertility rates, the PATFR is computed from the parity-specific fertility table. This table is derived from conditional age- and parity-specific fertility rates.
TFR is influenced not only by the intensity of childbearing, but also by the changing parity structure of female population. PATFR, in contrast, should not be affected by the changing distribution of women by parity.
Births classified by calendar year, age, and year of birth (birth cohort) constitute the finest element on the Lexis diagram, called Lexis triangle. There are two types of triangles: the lower Lexis triangle and the upper Lexis triangle. The lower Lexis triangle groups births that occur in year t to the birth cohort t-x at age x and the upper Lexis triangle contains births that occur in year t to birth cohort t-x-1 at age x.
Having births classified by Lexis triangles makes it possible to reconstruct any of the other three Lexis shapes: squares, vertical or horizontal parallelograms.
Countries differ in their methods of collecting and publishing birth data. In some countries, data are collected by year and age, and thus the definition of age is the age in completed years. Meanwhile, in other countries, data are organized by year and birth cohort, and thus the definition of age is the age reached during the year. The HFD is open to a variety of practices, and seeks to meet the needs of all users. It also allows users to choose the data format that best suits their individual research interests.
There is no simple answer to this question. Users of period data may prefer using age in completed years (Lexis squares), which allows them to compute the cumulated fertility rate at the time that an "exact age" has been reached (i.e., when attaining a given birthday), or to compute age-specific fertility rates for an interval between two exact ages. When there is no pertinent reason to focus on the "exact age" format, cohort fertility analysts may prefer to use data specified by the age reached during the year (vertical parallelograms in Lexis diagram), which allows them to make full use of the data for the most recent year of observation. In contrast, cohort fertility rates specified by age in completed years (horizontal parallelograms) require using data for two subsequent calendar years to reconstruct the age-specific fertility rate for any age-cohort combination.
This is done in an effort to get comparable basic indicators about cohort fertility of women with almost complete fertility histories. The major advantage of providing such data lies in obtaining observations for additional 10 "younger" cohorts with relatively recent fertility. These data may be complemented by an estimate of fertility rates expected to be realized after age 40, which typically constitute a few percentage points of the total completed fertility.
The age format of the period data affects the computation of the mean age at childbearing. Births organized by "year + age" (i.e., age in completed years) occupy squares or rectangles on the Lexis diagram, while births classified by "year + cohort" (or, in other words, "year + age reached during the year") correspond to a vertical parallelogram. A woman of the age x in completed years is, on average, half a year older than a woman of the age x reached during the year. Therefore, in the calculation of the mean age at childbearing, the average share of the age interval [x,x+1) lived before giving birth to a child is assumed to be 0.5 for any completed age x, and to be 0 for any age x reached during the year. For more details, see the Methods protocol.
A cohort fertility table provides multifaceted and detailed information about the fertility patterns and trajectories of female birth cohorts, with a particular focus on parity-specific data. With the help of a cohort fertility table, the analyst can explore different layers of fertility behavior that go far beyond the conventional aggregate indicators of fertility. In particular, these fertility tables facilitate the analysis of age- and parity-specific patterns of family building, inter-cohort changes in parity-specific fertility behavior, trends in fertility "postponement" and "recovery," as well as cross-country comparisons.
Period fertility tables provide a wide array of indicators that give a fine-grained account of parity-specific changes in fertility quantum and tempo. These tables facilitate a decomposition of aggregate trends into their age- and parity-specific components, which are often hidden in the aggregate indicators. They yield an aggregate index of fertility controlling for age and parity (PATFR), which is, for first births, considerably less affected by changes in the timing of childbearing than the conventional period TFR. Period fertility tables also enable researchers to analyze shifts in parity-specific fertility rates, draw detailed cross-country comparisons, identify important trend reversals, and explore parity-specific fertility reactions to changes in socioeconomic or cultural conditions.
Period fertility tables in the HFD are usually based on the female exposure population by age and parity obtained by cumulating fertility rates of given cohorts over long periods of time. This approach is based on the rather strong assumption that migration and mortality are not selective with respect to fertility, i.e., that those who die or migrate have, at any given reproductive age, the same parity distribution and completed fertility as those who survive and stay in a country. Many developed countries have experienced large waves of immigration in recent decades, and these female migrants tend to have different parity distributions and fertility behavior than the "native women," which violates the statistical assumption of "no effect of migration" on fertility. In addition, the permanent or temporary out-migration of younger women may violate this assumption. In contrast, census- or register-based period fertility tables use records on female population composition by age and parity for the total resident population in a country, and thus control for the selective effects of migration and mortality on completed fertility among the resident population. Thus, census- or register-based tables better reflect past fertility histories among the actual population, including births realized before migrating into a country. However, in some cases missing or incomplete data may make census- or register-based data less reliable than the parity distribution estimated for the main annual series of fertility tables in the HFD.
Migration is taken into account in the estimation of female population exposures (i.e., the number of women by age, cohort, and calendar year; see the Human Mortality Database Methods Protocol for more details). However, the HFD estimates of the parity distribution of female population by age are based on the assumption that the parity structure of migrants does not differ from that of the general population.
"Birth order" refers to the birth order of the child, whereas "parity" refers to the number of children a woman has given birth to at the time of observation. In the past, when marriage was assumed to be the only "legitimate" status for childbearing, marital birth order was registered by many statistical agencies. Today, due to the prevalence of non-marital childbearing, most statistical agencies, as well as the HFD, collect data for biological birth order only. Note that in the data used for the HFD calculations, only live-born children are considered in counting birth order.
One of the most common methods for obtaining the age-parity distribution of women, which is necessary for the computation of period fertility tables, is the reconstruction of the lifetime fertility of cohorts from the time series of fertility rates by age and birth order. In some cases, however, especially when the age- and order-specific birth data are available for a short period only, the female population distribution by age and parity from a population census or register can be used to build left-censored cohort fertility histories. This approach enables us to extend the time series of data on the period age-parity distribution of women and thus of the period fertility tables. The population census or register used for this purpose is called the "golden" census. For more details about the use of the "golden" census, see the Methods protocol.
The essential difference between the two types of rates lies in the denominator. Unconditional age-specific fertility rates measure births specified by the age of the mother and the birth order (when available) to all women of a given age, whereas conditional age- and parity-specific fertility rates measure childbearing intensity among women of specific ages and parities (e.g., second births are related to women of parity one only).
Do you have...?
The HFD will be expanding to include all countries with complete or nearly complete birth registration, and with birth and population data meeting the HFD data quality requirements. This implies that most of the countries listed in the HFD will be highly developed countries with complete vital statistics registration and reporting.
The Human Mortality Database (HMD) provides detailed mortality and population data. At present the database contains detailed population and mortality data for many countries or areas. See http://www.mortality.org/.