![]() The study of uncertainty does not lead to a point estimate but to a set of plausible estimates. This approach does not require the conditional independence assumption nor of auxiliary information on non-identifiable parameters, i.e., those related to relations between Y and Z. It is worthwhile noting that an alternative approach based on the quantification of the uncertainty inherent the estimation of a particular parameter can be used. The mixed approach is composed of two steps: 1) a parametric model is assumed, parameter estimation is performed, and imputation is carried out 2) a hot-deck imputation procedure is applied, it makes use of imputed values for choosing the donor observation. ![]() The donor unit for a given unit in A is the most similar observation in B in terms of the values of the common variables X. They aims at imputing missing variables in the data set chosen as recipient (for instance A) by using the observed values in the data set (B) chosen as donor. In this case hot-deck imputation methods are frequently used. Nonparametric methods are usually applied when you have a micro objective. This assumption is rather strong and unfortunately in the typical situation of the matching it is not testable. In absence of auxiliary information, it is generally assumed the conditional independence of Y and Z given the common variables X. The parametric approach requires the specification of a model and the estimation of the related parameters. The objectives of matching can be achieved by means of a parametric or nonparametric approach, or a mixture of them (mixed methods). The objective of statistical matching can be macro or micro in the first case the interest is in one or more parameters that summarize the relationship between Y and Z (correlation coefficient, regression coefficient, contingency table, etc.) in the second case the result of integration is a synthetic data set in which all the variables of interest, X, Y and Z are present. The typical situation of statistical matching is the one in which there are two data sources A and B variables X and Y are available in A, variables X and Z are observed in B the objective is to study the relationship between Y and Z by exploiting the common information in X. The sources to be integrated are composed of different non-overlapping units as usually happens when data from several sample surveys are integrated. The goal of statistical matching (sometimes named as data fusion) is the integration of two or more data sources referring to the same population with the aim of exploring the relationships between variables that are not jointly observed in the same data source. ![]() The lack of unique identifiers requires sophisticated statistical procedures, the huge amount of data to process involves complex IT solutions, constraints related to a specific application may require the solution of difficult linear programming problems. The complexity of the whole linking process relies on several aspects. to check the confidentiality of public-use microdata.to measure a population amount by capture-recapture method.to improve the data quality of a source.to create, update and de-duplicate a frame.to enrich the information stored in different data sets.In official statistics, record linkage is needed for several applications: for instance, The purpose of record linkage is to identify the same real world entity that can be differently represented in data sources, even if unique identifiers are not available or are affected by errors. Record linkage is an important process for the integration of data coming from different sources.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |