Fun with Statistics


Research Seminaria




Abstract
In this lecture we will argue that the randomized versus quasi-experiment is a false dichotomy because each design falls on a continuum of control for confounding that can lead to spurious observed relationships. To define the continuum, we quantify the extent of statistical control in terms of the Frank's (2000) impact of confounding variable. Then we evaluate the degree of statistical control by comparing the impact of covariates on estimates of interest against the expected impacts under randomization. That is, we use theoretical randomization as a baseline for evaluating the effective control of any study instead of using a single randomized empirical study as a gold standard against which to compare others. Ultimately, we will report that the quasi-experiment in Hong and Raudenbush (2005) crosses the threshold for equivalence with a theoretical randomized study. We then develop a general formula and guidelines for equating quasi-experiments and randomized experiments based on the degree of statistical control achieved. In the discussion we emphasize the validity of causal inferences from quasi-experiments and the false dichotomy between quasi- and randomized experiments.

References



Abstract
When the goal of inference is estimating causal effects, we usually have to face problems related to how data are observed. In observational studies the most relevant of such problems is the fact that assignment to treatment is not under the control of the investigator; in addition some studies, both observational and experimental, may be affected by different sorts of post-treatment selection of observations due to, e.g., non-response, truncation or censoring `due to death'. All such complications require to somehow control for them, but the use of the standard statistical conditioning is improper. A relatively recent approach to deal with post-treatment complications is Principal Stratification, as first defined by Frangakis and Rubin (2002) within the framework of the Rubin Causal model and applied mainly in experimental studies. The framework appears to be a very general one, that can be applied in various contexts. We consider a specific post-treatment complication that may arise in both randomized and observational studies, namely the problem of nonignorable nonresponse on an outcome variable. By exploiting Principal Stratification, we analyze and propose identification strategies with and without the availability of an instrumental variable for nonresponse.
    As a motivating example we consider a simplified evaluation study in the field of financial aids to firms, where typically missingness on the outcome variables, such as variables related to firms' performances, can rarely be assumed missing at random.

References


  • Previous meeting
    Wednesday, June 27, 2007

  • Lecturer and subject
    Dr. L.R. Arends, Department of Epidemiology & Biostatistics (Erasmus MC, Rotterdam) and Institute of Psychology (Erasmus University Rotterdam)


    Multivariate Meta-Analysis: Modelling Heterogeneity


    Abstract
    Meta-analysis may be broadly defined as the quantitative review and synthesis of the results of related but independent studies. In a meta-analysis the interest does not always concern only one specific outcome measure. Sometimes the focus is on the combination of several outcome measures that are presented in the individual studies, for instance, when there are more treatment groups or more outcome variables. When the summary data per study are multi-dimensional, the data analysis usually is restricted to a number of separate univariate analyses, i.e., one analysis per outcome variable. However, such univariate analyses neglect the relationships between the multiple outcome measures. In a multivariate meta-analysis all outcome measures are analysed jointly, therefore also revealing information about the correlations between the multiple outcome variables.
        In this presentation I will show some clinical applications of multivariate meta-analysis.

    References



    Abstract
    Rank data typically arise in two different settings. Firstly, as ranking of a small number of items produced independently by a larger number of subjects. An example is patient preferences. In the second setting the continuous measurements on a group of subjects are converted into rank ordering before analyzing the data, for example in order to apply the Wilcoxon ranksum test. In both situations an explicit probability model for ranks would be desirable as a starting point for statistical inference. An important example of such a model is the proportional hazards regression model of Cox. It is much used for analysis of survival data but rarely as a general method for nonparametric regression. This is, at least partially, due to lack of invariance of the model under changing the sign of all observations (reversing the ranking). In the context of the first setting the Cox model is also known as a forward-selection model.
        In the presentation I will briefly review properties and relative merits of existing models for rank data and then introduce a new model and illustrate its use.

    
    


    Abstract
    Studying short-term dynamic processes and change mechanisms in interaction yields important knowledge that contributes to understanding long-term social development of children. In order to get a grip on this short-term dynamics of interaction processes, we made a simulation model of dyadic interaction of children during one play session, which is inspired by dynamic systems principles (Thelen & Smith, 1994; Van Geert, 1994). The central aim of this model is to generate patterns of interaction that correspond with observed interaction patterns in children of different sociometric statuses, which have been observed in our empirical study. In this study, three types of dyads were formed, comparable with the dyads distinguished in the model. The dyads consisted of two same-sex grade 1 pupils, of whom one child had either a rejected, a popular, or an average status, and in which the play partner always had an average status.
         The theoretical components of the model comprise children’s goal-directedness of actions, concerns, emotional appraisals, social power, and social effectiveness. The model’s output consists of simulations of children’s emotional expressions and actions over every second of a play session, of three groups of dyads of different sociometric statuses. In this presentation, I will go into the empirical validation of the model and the methods needed for such validation. It focuses on the model’s predictions of averages and distributions of the major variables and on the model’s sensitivity. Overall, the model fits the empirical data well. An exception is the lesser fit of the ‘popular’ group of dyads, which is explained by the limited use of social effectiveness in the model. In the discussion, I will reflect - among others - on the implication of our findings for using this type of simulation models in the field of research on social development.

    References

    
    


    Abstract
    When parametric and semiparametric methods fail to capture the shape of the conditional hazard rate, nonparametric methods are a useful alternative. This paper proposes a new nonparametric estimator for the conditional hazard rate, which is defined as the ratio of local linear estimators for the conditional density and survivor function. We show that the resulting hazard rate estimator is pointwise consistent and asymptotically normal distributed under appropriate conditions. Furthermore, we derive plug-in bandwidths based on normal and uniform reference distributions, which minimize the asymptotic mean squared error. The new estimator performs competitively in terms of mean squared error in comparison to existing estimators for the hazard rate. Moreover, its smoothing parameters are relatively robust to misspecification of the reference distributions, which facilitates bandwidth selection. Additionally, the new hazard rate estimator is conveniently calculated using standard software for local linear regression. We illustrate the use of the local linear hazard rate in an application to kidney transplant data.

    Reference

    
    
    
    
    
    


    Abstract
    We study the nonparametric maximum likelihood estimator (MLE) for current status data with competing risks. Current status censoring occurs when the variable of interest is not observed directly, but only known to lie before or after a certain time. Current status data with competing risks arise naturally in cross-sectional survival studies with several failure causes, and generalizations arise in HIV vaccine clinical trials.
         Until now, the large sample properties of the MLE have been mostly unknown. We resolve this issue by proving consistency, the rate of convergence, and the limiting distribution of the MLE. These asymptotic properties are nonstandard, due to the censoring. The rate of convergence is slower than usual, and the limiting distribution involves a new self-induced process. I will illustrate this process in an example.

    References

    
    
    
    
    
    
    


    Abstract
    Inference from data to a population traditionally proceeds by null hypothesis testing. This tells whether the data support the hypothesis that in the population some quantity is zero (a difference between means, a correlation, an indicator of growth over time). However, this is seldom the focus of interest for the researcher who collected the data. Researchers tend to be more interested in the alternative situation in which the null hypothesis fails to hold. Usually the alternative hypothesis is uninformative, e.g. ‘the means at our four time points are not all equal’, although researchers often possess informative and competing hypotheses, e.g. ‘the means are increasing from time point 1 to 4’ or ‘the means at time points 1 and 4 are smaller than at time points 2 and 3’. This presentation addresses researchers who want to evaluate informative hypotheses.
        Point of departure is that adequate statistical tools should be available to researchers who have informative hypotheses (prior knowledge) in the form of hypothesized order relations between statistical parameters. Such knowledge may come from theories, earlier research expertise, or difference of opinion with colleagues. Bayesian model selection applied to order-restricted alternatives has recently become feasible when enough computing power became available for the required algorithms.
         Two examples will be used to illustrate the approach proposed: order restricted analysis of variance, and order restricted models for the analysis of contingency tables. A potential drawback of the Baysian approach is the sensitivity with respect to the prior distributions that have to be specified. Supported by theoretical derivations, both examples will be used to discuss the prior sensitivity of the Bayesian approach proposed.

    References

    
    
    


    Abstract
    Relatively much money is spent on setting up and carrying out promotions. In current research, mainly the effects of promotions on sales of a brand or product have been examined. It has been flagrantly and frequently established that promotions have a strong positive effect on sales in the short term. Relatively little is known, however, about the mechanisms of promotion's effectiveness, or the lack of it. Especially little has been published about underlying matters, like the evaluation and knowledge of consumers on promotions.
        So the key question is: Why are sales promotions effective? The focus of the present study is on sales promotion, i.e., any temporary offer that is not available for the normal product, in the normal quantity, for the normal price and/or in the normal distribution. More specifically, the main research question of this study is: Which variables explain why consumers participate in promotions?
        To answer this question a multilevel model will be estimated on panel data from the Trendbox company. Because of the multilevel model being used, two secondary questions are considered:

    1. How does the outcome variables of participation on and the attitudes towards promotions change over twelve subcharacteristics of promotions (i.e., the twelve within effects)?
    2. Can the differences in individual changes be explained by age and/or sex (i.e., by these two between effects)?


    
    
    
    


    Abstract
    A standard assumption in statistical causal inference is that the response of an outcome variable Y to an unconfounded cause X should not depend on how X is set to a particular value x. Unfortunately, in many realistic cases this assumption fails. For example, since Total Cholesterol is the sum of low density lipoproteins (LDL) that are bad for you, and high density lipoproteins (HDL) that are protective, then a manipulation in which Total Cholesterol was raised solely by raising LDL levels would not have the same effect as a manipulation in which Total Cholesterol was raised solely by raising HDL levels. In some cases, this ambiguity in a "manipulation" is harmless, in some cases not. In this talk I discuss the problem this presents for causal discovery and causal inference, and a foundational approach for how we might deal with it.

    References


    
    
    
    


    Abstract
    Multilevel multivariate data occur in many forms. Examples are scores on a number of items collected from inhabitants within different countries, or NMR spectra of urine samples collected at a number of measurement occasions from different monkeys. To explore the relationships between the variables in this kind of data, the general framework of Multilevel Component Analysis can be used. The method is a hybrid between analysis of variance and principal component analysis (PCA). That is, the observed data are split into orthogonal parts, and they are analyzed separately by PCA, or variants thereof. In the case of two-level multivariate data, the results of an MLCA thus offer insight into both the between and within variability. MLCA can easily be adapted for modelling data resulting from an experimental design. Examples from various fields will be shown to illustrate the usefulness of the approach. Furthermore, some relationships between MLCA and alternative models, mainly in the structural equation modelling field, will be discussed.

    
    
    References


    
    
    
    


    Abstract
    The Hunt for the Last Respondent has been inspired by concerns about the possibly detrimental effect of nonresponse on the accuracy of survey outcomes, as response rates are generally considered to be the most important criterion of survey quality, and the Netherlands is notorious for its low response rates. The study addresses a number of general questions, such as: Why are high nonresponse rates a reason for concern?; Who are less likely to respond, either because they are more difficult to contact or because they are more reluctant to cooperate?; How can response rates be enhanced?
        Analyses of nonresponse on two surveys in which the SCP is involved, namely the Dutch Amenities and Services Utilisation Survey 1999 and its follow-up survey among persistent refusers, and the European Social Survey 2002/2003, aim at answering additional questions, such as: How to study nonresponse?; Do enhanced response rates improve the accuracy of survey outcomes?; How to combat nonresponse error and allocate funds effectively? The study shows that specific groups in society may be hard to contact and less willing to cooperate in surveys. This can result in bias when the determining factors of survey participation are directly related to the subject of a survey. Nonresponse rates can be enhanced substantially, but enhancing response rates does not always improve the accuracy of survey outcomes. The study recommends to spend a part of the funds for data collection on obtaining information about final nonrespondents as this is more effective than raising the response rate by a few percent.

    
    
    References


    
    
    
    
    


    Abstract
    In many applications, social networks are not static but evolve over time. This can be due to purely structural, network-endogenous mechanisms (like reciprocity or transitivity), but also due to individual characteristics of the actors in the network (what is a relevant choice of actor characteristics will depend on the type of relation – one may think of gender, age, habits, substance use and other health behaviors, political preferences, etc.). Changeable individual characteristics, in turn, are often mediated by social networks. Processes of social influence, contagion, or group differentiation, all depend on the social network as their ‘structural substrate’. Examples are smoking initiation among adolescents – and other substance use and abuse –, the formation of attitudes and norms, the dynamics of fads, collaboration in organisations, etc. This mutual interference between network dynamics and the dynamics of changeable actor characteristics, together with the already complex interdependence structure that characterizes social networks in general, poses a statistical challenge. In principle, the collection of longitudinal (panel) data on networks and individual characteristics allows for separating effects in both causal directions on empirical grounds.
        Since recently, statistical methods have been developed to analyze the dynamics of social networks, and also the simultaneous and interrelated dynamics of social networks and the behavior of the actors in the network. These methods are based on stochastic microsimulation models representing the dynamics of a relational network in a set of actors. Associated to these models are procedures for parameter estimation and testing, using Markov chain Monte Carlo methods. These procedures are implemented in the program SIENA. Researchers who have been active in developing these statistical methods include, in addition to the author, Marijtje van Duijn, Mark Huisman, Johan Koskinen, Michael Schweinberger, and Christian Steglich. A review will be given of the work on this methodology and some applications.

    
    
    References


    
    
    
    
    


    Abstract
    A simple psychometric measurement model without latent variables. It requires that after possible rescaling, measures for the same construct have the same associations to variables in the nomological net around the construct and are measured on the same scale. This exchangeability model is equivalent to factor analysis and Rasch models lacking latent variables assumptions. The model does not assume local independence and is completely void of causal connotations.

    
    
    References


    
    
    
    
    


    Abstract
    Science infers general statements and predictions from limited bodies of empirical evidence, and it therefore faces the problem of induction. Statistics plays an important role in how science solves this problem. In my talk I first make precise what role it plays, and then investigate the extent to which, in this role, it can support the realist ambitions of science.
         The first task involves a critical analysis of the logical empiricist views of Carnap, and a reformulation of inductive inferences as Bayesian logical arguments. The second involves a reversed application of De Finetti's representation theorem, and a rather delicate mix of his strict subjectivism with the frequentist theory. However, these reform measures do not yet go far enough. In the last part of the talk will argue that scientists have good reasons for employing underdetermined statistical models.

    
    
    References


    
    
    
    
    
    
    


    Abstract
    The delay in language and arithmetic achievements of school children in Frisian education can be ascribed in part to the lower quality of primary schooling in Fryslân. An example is the finding that Frisian schooling lags behind as far as pupil counselling is concerned. Moreover, it has been shown that Frisian teachers remain behind in didactical skills and they devote less teaching time to the subject of arithmetic than in the primary schools in Limburg (Van Ruijven, 2003; 2004). These are the most prominent conclusions drawn in the study into Frisian education. The study initially focusses on the educational level of the school children in Frisian primary education. In this respect, educational achievements of pupils in grade 7 in Frisian primary education have been compared with the national mean and with educational achievements in comparable provinces. Next to these analyses, the explanatory question has been put forward. In view of the explanation of the lower educational achievements in Fryslân, the results of the Frisian pupils have been analysed in relation to features at individual and school level. Regarding these explanatory analyses, the data of the pupils and schools in the comparable provinces have been used as a reference point as well.
         In this presentation the design of the study and its analyses will be central points of interest. Firstly, I will pay attention to the selection of the reference provinces. Which criteria have been applied in selecting the reference provinces and how has the correspondence between the provinces been determined?
         In the second part I will concentrate on the design of the analyses concerning the explanatory research question. With an eye to the nested structure of the data, the multi-level technique has been applied. A special characteristic of the design followed is that a second technique, cluster-analysis, has been integrated into the multi-level design. Which opportunities do both techniques offer to educational research? And did it work to explain the interprovincial achievement differences this way?

    
    
    References


    
    
    
    
    
    

    Probability and the Logic of Science

    
    


    Abstract
    A critical treatment of some fundamentals of a book by Jaynes (2003) is given. By `inference' Jaynes simply means: deductive reasoning whenever enough information is at hand to permit it; inductive or plausable reasoning when, as is almost invariable the case in real problems, the necessary information is not available. But if a problem can be solved by deductive reasoning, probability theory is not needed for it.
         Jaynes' topic is the optimal processing of incomplete information. If degrees of plausability are represented by real numbers, then there is a uniquely determined set of quantative rules for conducting inference. That is, any other rules whose results conflict with them will necessarily violate an elementary and nearly inescapable desideratum of rationality or consistency.
         This gives a new perspective to the foundations of probability theory.

    Reference


    
    
    
    
    
    
    


    Abstract
    In this lecture we present an exploratory model-based clustering approach for the analysis of large data sets. Basically, a model-based cluster analysis (another name that is often used is latent class analysis) searches for homogeneous groups of persons that is groups of persons that give similar responses to a set of items.
         The approach addresses a number of practical problems that often arise in exploratory model based cluster analysis of large data sets:

    1. The application at hand concerns a large data set (4000 persons and 250 items). Obtaining clusters for large data sets is not a well developed area. Based on two conjectures a MCMC clustering algorithm that renders non-degenerate clusters will be proposed.

    2. For large data sets the usefulness of the likelihood ratio goodness-of-fit test is questionable because the number of possible response vectors by far outweights the number of observed response vectors. An alternative that does not suffer from this limitation is proposed.

    3. In the application at hand data are missing by design: each person receives a specific subset of the 245 items in the test. It will be shown how to deal with the missing data both for obtaining clusters and goodness-of-fit evaluation.

    Reference


    
    


    Abstract
    It is well-known that maximum likelihood estimators are under some regularity conditions Fisher-efficient. Fisher-efficiency is an asymptotic property only, and despite its asymptotic appeal, is of little value in practice where sample size is not always large, where it is uncertain what "large" actually means, and where the sample size surely doesn't go to infinity, in fact, "in practice, the sample size doesn't go anywhere" (Geyer, 2004).
         Less well-known is that if a probability density admits a sufficient statistic (which is only the case in some families of probability densities), the maximum likelihood estimator is a function of it, providing a finite-sample justification of maximum likelihood estimators which, in practice, is much more relevant than Fisher-efficiency. Other finite-sample justifications of maximum likelihood estimation exist.
         The present talk reviews another, more general, finite-sample justification of maximum likelihood estimators, which, though published in the 1960's in the Annals of Mathematical Statistics, received little attention up to date. If time permits, the talk will go on to briefly touch issues such as what properties (efficiency versus robustness, et cetera) of statistical estimators may be desirable in practice, and discuss the importance of the score function in testing.


    References


    
    


    Abstract
    Over the years, many studies have focused on the question whether smaller classes result in higher pupil achievement. Although these studies have not provided unequivocal support for the hypothesis that smaller classes result in higher achievement, their conclusion has been positive. A question that logically follows this conclusion concerns why smaller classes result in higher achievement. Few studies have focused on classroom processes that may explain the existence of such relationships. This study has not only focused on the basic relationship between class size and achievement, but also on other variables, including adaptive education, pupil-teacher interactions, and the amount of time pupils spend on task.
         For various reasons, it was decided to consider the concepts of pupil-teacher ratio (PTR) and pupil-adult ratio (PAR) in addition to class size. This was done with the goal of determining the effects of adding extra teachers or teaching assistants to existing classes without reducing the actual class size.
         The main research questions of this study were:

    Data were collected within 46 primary schools. Because the data had a nested structure (pupils were nested within classes and teachers were nested within schools), multilevel analysis was required. These analyses were conducted with the help of the MLWin software package. The relationship between class size (PTR or PAR), by way of pupil-teacher interactions and subsequently time on task and language achievement is examined by means of path analysis. Path analysis is a statistical technique for multivariate analysis, through which the best fitting path of mutual relationships among variables is identified. These analyses used a conceptual causal model as a starting point and were conducted with the help of the Mplus software package (Muthén & Muthén, 2001).


    References


    
    


    Abstract
    The regression trunk approach (RTA) integrates two existing analysis methods: regression trees (Breiman, Friedman, Olshen & Stone, 1984) and multiple linear regression analysis. RTA assesses interaction effects between predictors, in the regression of one continuous response variable on multiple predictor variables (Dusseldorp & Meulman, 2001). A core element of RTA is a different representation of interaction, that is, as a threshold interaction, as opposed to the commonly used cross-product representation. The method has been especially developed to detect interaction effects between a categorical predictor variable and one or more continuous predictor variables. Instead of the sequential approach proposed by Dusseldorp and Meulman (in press), we recently developed a simultaneous approach. The sequential approach consisted of three steps of analysis. In a first step, a linear main effects model was estimated. In a second step, a small regression tree (called a "regression trunk") was fitted on the residuals of the first step. And in a final step, the tree was converted into contrast variables, which were added to the main effects model. In the simultaneous approach, a regression tree is fitted with a modified partitioning criterion. The commonly used criterion is to split on a variable that maximally reduces the within node sum of squares. The modified criterion is to split on a variable that induces the highest F-value of a model comparison test. I will explain the details of this test in the presentation. Advantages of this latter approach are that possible correlations between main and interaction effects are taken into account, and that the parameters of the total model can be estimated directly.
         The method will be demonstrated using data from patients with panic disorder. Our objective was to explore whether and how pre-test locus of control variables (continuous variables) interacted with the treatment variable (a categorical/nominal variable). Several RTA models were estimated, differing in the number of interaction effects included. The best RTA model was selected based on the cross-validated error. This model included two significant interaction effects. We compared the RTA solution to a regression analysis model with cross-products to represent interaction effects. The results indicated that none of the cross-products were significant. Finally, we investigated whether the RTA solution could be confirmed on nine outcome variables of interest. We conclude that a threshold representation of interaction, as estimated by RTA, is a promising alternative representation of interaction effects.


    References



    
    


    Abstract
    The achievement level is a variable measured with error. An estimation of the achievement level can be obtained on the basis of tests administered to students. The results of these tests can be analysed with the Rasch model, which provides a measure of the abilities and an estimation of the measurement error variance. Also teachers' marks are a measure of the achievement level, but they are expressed on a different scale that could be continuous or ordinal and is varying across teachers. The aim of this study is to obtain a synthetic measure of the achievement level from the combination of these two scores on the basis of a latent trait model. The new measure is comparable across subjects and it presents a smaller measurement error variance than any of the two single scores. The procedure also provides a tool for assessing the proximity of the two sets of scores, as large differences between the Rasch achievement estimation and teachers' marks are deemed as pointing to situations deserving further investigations.
        


    
    


    Abstract
    A recent issue of the Journal of Educational and Behavioral Statistics (2004, Volume 29, Number 1) is entirely dedicated to what are now called Value-Added Models, VAM. These models are used to assess the performance of schools and teachers. The models presented in this volume range from very simple, fixed effect path analytic models, to very complex mixed models that allow for teachers fading influence effects.
         Since the results obtained with VAM are sometimes the basis of a reward system, these models have become a subject of intense critique. In the Fun with Statistics meeting we would like to discuss the comments by Rubin, Stuart & Zanutto, and that of Raudenbush published in the JEBS volume. In different degrees, they express their concerns about the use of VAM for causal inference.


    References


    
    


    Abstract
    New specifications for exponential random graph models were introduced with the aim of modelling social networks exhibiting high transitivity. Earlier models (Markov random graphs) were often degenerate and unable to be estimated. The new specifications are estimated for a number of well-known data sets, showing much improved performance compared to the earlier versions of the models. As well, simulations studies with fixed parameter values, but with increasing network size, give insight into the behaviour of the models.


    
    


    Abstract
    A popular approach to model digraph panel data is to embed the discrete observations of the digraph into an unobserved continuous time Markov process. An appealing and large family of Markovian probability distributions was considered by Snijders (2001). The infinitesimal generator of the Markov process is parameterized by multinomial logit modeling as advocated by McFadden (1974). The digraph evolution is modeled as vertex-driven, by vertices making choices regarding arcs to other vertices, and the choice probabilities depend on an objective function. The objective function is modeled as linear in an unknown parameter q, and q captures relational tendencies (patterns) such as reciprocity or transitivity. Convenient distributional assumptions lead to a multinomial logit form of the choice probabilities.
         The contents of the present lecture builds upon, and extends, the mentioned family of probability distributions. The objective function lying at the heart of the multinomial logit formulation is modeled as linear function where some coordinates of parameter q are treated as random coefficients that can be predicted by covariates. The value of the approach lies in the fact that vertex heterogeneity can be modeled parsimoniously, that is, the number of parameter coordinates needs not to increase with the number of vertices. This implies that (1) the dangers of vertex-dependent parameter spaces (bad estimators and numerical problems) are reduced, and (2) covariates can be used to explain vertex-dependent predilections regarding, for instance, reciprocity or transitivity.
         The maximum likelihood method is proposed to estimate the unknown parameter h ("fixed coefficient'") and the unknown covariance matrix of the random coefficients from digraph panel data.
         As the continuous time Markov process is unobserved, the likelihood function cannot be written in closed form, but maximum likelihood estimates can be obtained by augmenting the observed data with latent data using MCMC methods.


    References


    
    


    Abstract
    Interest in parametric item response theory (PIRT) models for unfolding has increased steadily during the last two decades. Unfolding PIRT models are typically appropriate for measuring attitudes using responses to traditional (Likert or Thurstone) attitude questionnaires. They can also be used to measure individual preferences for alternative stimuli and individual status in certain developmental processes that occur in stages. Like other PIRT models, unfolding PIRT models provide the potential for sample invariant interpretations of item parameters, item invariant interpretations of person parameters, and indices of measurement precision at the individual level. These features make applications such as item banking, equating, and computerized adaptive testing possible in an unfolding PIRT framework.
        This presentation will review the development of a family of unidimensional PIRT models for unfolding polytomous responses. The most general model in this family is known as the generalized graded unfolding model (GGUM). A review of several methodological issues inherent in GGUM applications will be conducted. These issues will include parameter estimation and item fit. The application of the GGUM to computerized adaptive testing and mixture models will also be explored.


    References


    
    


    Abstract
    Abilities, attitudes, and personality traits are usually considered to be continuous latent traits. However, a discrete interpretation is sometimes preferable. For example, pupils may be assigned to several levels of training, which corresponds to a discrete interpretation of scolastic aptitude. Specific abilities may be mastered or not, dividing the examinees in masters and non-masters. Therapy clients may be ordered from not depressed via slightly depressed to severely depressed. Such a discrete ordering of subjects occurs in ordered latent class analysis.
        In this presentation, the ordered latent class model will be formulated, and the possible ways of estimating it will be discussed (the ins). Further, the relation to nonparametric item response theory (NIRT) (the outs) will be discussed by showing how the assumptions of NIRT can be tested with an ordered latent class model, and how the sampling distribution of scaling coefficient H can be obtained.


    Reference


    
    


    Abstract
    An often heard critique on the use of the Rasch model is that the model is not suited for multiple choice questions, because the model predicts a probability for correct answers going to zero for very low levels of ability, while for a multiple choice item with four alternatives, pure guessing gives a probability of 25%. Treating a multiple choice item as a choice situation, one may study the behavior of students not as an expression of cognitive ability but as mere choice behavior. In the presentation, Luce's (1959) choice theory is used as a starting point. A slight generalization of the theory leads to a choice model which is formally identical to the Rasch model, and where the concept of individual ability has a clear meaning as an odds ratio.


    Reference


    
    


    Abstract
    By assessment I mean the attribution of a quality or qualification Q to a person or object P by judges or assessors B. Assessments constitute an important and consequential part of human discourse. Moreover, much of the data in the social and behavioral sciences consists of assessments, rather than measures in any strict sense. The question here is how to conceive of such qualifications, and how to handle them. Traditionally, qualifications are conceived as measurements on a relative metric scale, and handled through classical psychometrics and statistics. I question the appropriateness of that approach and the assumptions that it implies. I develop a more elementary or primitive conception that I argue to be more adequate in the typical case.


    References

    
    
    
    


    Abstract
    Although psychologists and methodologists agree that test validity is the main problem in psychological measurement, it has proven difficult to state exactly what validity is. Approaches based on sampling from a content domain (content validity), on the correlation between a test score and another test score (criterion validity), and on correlations between a test score and a lot of other test scores (construct validity), have not been able to produce a consistent picture of the validity concept. In this presentation, I argue that this failure results from a fundamental mistake in the conceptualization of psychological measurement. Namely, measurement is generally seen as involving the correlation between the test score and something else (for instance, the psychological attribute to be measured). However, it should be based on the causal effect of the attribute on the test scores. Reconceptualizing validity in causal terms solves many of the philosophical troubles of traditional validity concepts, and in addition suggests that different research directions are required to establish test validity. For instance, traditional validation studies have focussed on purely external relations (e.g., relations between test scores and other test scores). However, I suggest that research must rather be directed at internal processes (e.g., the item response processes that lead to the test scores), because the central question is whether these processes constitute, or transmit, the causal effect of the attribute. Theoretical and practical implications of the proposed theory will be discussed.


    Referenties

    
    
    
    


    Samenvattting
    Hoe 'moet' de wetenschapsman, indien gevraagd, op grond van statistische data x zijn opinie vormen en verwoorden omtrent iets onbekends, y genaamd, opdat deze opinie mee kan worden genomen in de discussie omtrent het kiezen van een of andere beslissing a? Hoe verwoordt hij zijn opinie omtrent een hypothese H dat y in een bepaalde deelverzameling zit van de klasse van alle mogelijkheden van y? Hoe vormen wij een verdelingsuitspraak omtrent y? Hoe kiezen we a uit de verzameling A van alle mogelijke beslissingen?
         Er zijn drie hoofdstromen. De eerste is die der data-theoretici. Zij trekken zich niet veel aan van de restrictie dat de data statistisch zijn, dat wil zeggen volgens voorschriften zijn verzameld zodat statistische veronderstellingen van aselectheid, onafhankeljkheid, en dergelijke, zijn gewaarborgd. De tweede is die van de klassieke statistici die de vraag om concrete data x te bewerken vertalen in de vraag volgens welke procedure men abstracte data x zou moeten bewerken als men a priori nadenkt over wat er allemaal aan uitkomsten x mogelijk zijn bij het uitgevoerde/uit te voeren experiment. De derde hoofdstroom is die der Bayesianen. Door op een of andere manier één gezamenlijke verdeling P te postuleren, waaruit (x, y) een trekking is, kunnen zij op fraaie coherente wijze vastleggen wat de waarschijnlijheid is dat de hypothese H waar is, wat een mooie verdelingsuitspraak is omtrent y en wat een juiste beslissing is, gezien de data x en, natuurlijk, de gepostuleerde kansverdeling.
        De spreker is van de tweede soort. Hij laat zien dat de resultaten van al die procedures niet zo mooi 'coherent' zijn als in de Bayesiaanse aanpak. Hij verdedigt dat als 'natuurlijk'.
        Er wordt gezorgd voor eenvoud en voor concreetheid (door Students t te beschouwen).


    Referenties

    
    

    The Circumplex Model

    
    


    Abstract
    The circumplex model is an Item Response Theory model for the measurement of subjects and items on a circular latent trait. Examples in psychology can be found in the study of personality (the Big Five) and emotions, and in sociology in the study of human values and vocational interests. The circumplex model can be regarded as an additional tool within Facet Theory. In the circumplex model subjects and items are located on the circumference of a circle. Each item is represented with two item step parameters, indicating the beginning and end of the arc along the circle that represents its latitude of acceptance. Subjects are represented with one location parameter. Subjects who are represented within the latitude of acceptance of an item will respond positively to the item in the deterministic model, or have a high probability of responding positively to the item in the probabilistic model. The model has been developed by Leeferink and Van Schuur, based on ideas from the nonparametric unidimensional unfolding model (MUDFOLD) of Van Schuur and the cumulative models of Mokken. These ideas include the use of a coefficient of homogeneity, a bottom up search procedure for finding a scale, and several diagnostic goodness-of-fit matrices. A paper, based on Leeferink's masters thesis, that explains the model for dichotomous data is available (Mokken, Van Schuur and Leeferink, 2002). Software for the model for dichotomous data will be demonstrated (program CIRCUS: circumplex scale). The generalization to polytomous data will be discussed.


    
    
    Goto top
    Back to Homepage of Anne Boomsma
    Update:  May 2, 2006
    Copyright © 2006 Anne Boomsma, University of Groningen, The Netherlands
    All rights reserved.