Please note that the descriptions and authors for the individual workshops may be subject to minor changes.
All workshops are on July 2nd, 2018. You can combine two half-day workshop for the price of a full day workshop.
|Full day workshops (9am - 5pm with a lunch break at 12pm)
400$ (early bird) | 425$ (regular price)
|FD1. Assessment of Collaborative Problem Solving Skills: An Overview||Alina Von Davier, Kristin Stoeffler|
|FD2. Ethics, Test Standards, and Test Interpretation: Measurement Matters||Gary L. Canivez|
|FD3. Introduction to Automatic Item Generation using CAFA AIG||Jaehwa Choi|
|FD4. Cognitive interviewing for interpreting DIF from a mixed methods perspective||Jose-Luis Padilla, Isabel Benítez|
|FD5. Generalizability theory: application and optimization||Lisa A. Keller, Robert J. Cook, Frank V. Padellaro|
|FD6. Applying the Standards for Educational and Psychological Testing in International Contexts||Linda Cook, Wayne Camara, Joan Herman, Kadriye Ercikan|
|Morning half-day workshops (9am - 12:30pm, including a coffee break)
225$ (early bird) | 250$ (regular price)
|AM1. Tools for Equating||Won-Chan Lee, Kyung (Chris) T. Han, Hyung Jin Kim|
|AM2. Quality Control Procedures for the Scoring and Rating of Tests||Avi Allalouf|
|AM3. Test and Score Report Design Principles for Culturally and Linguistically Diverse Populations||Maria Elena Oliveri, April Zenisky|
|Afternoon half-day workshops (2pm - 5pm)
225$ (early bird) | 250$ (regular price)
|PM1. Applying Test Equating methods using R||Marie Wiberg, Jorge González|
|PM2. Crafting adapted tests with a focus on a-priori methods||Dragos Iliescu|
The goal of the tutorial is for the participants to learn about the considerations for test development and the computational psychometrics methods for the collaborative problem solving (CPS) assessments.
Collaborative problem solving (CPS) skills are hard-to-measure competencies that are considered among the necessary 21st Century skills for academic and professional success. The challenges associated with the measurement of the CPS skills are multifaceted, and range from a lack of consensus around the construct definition to finding appropriate models for the interdependent data. Only in past seven years the measurement community took interest in developing assessments for the CPS skills. This evolution goes hand in hand with the technological advances that allowed the test development, administration, and data collection to be concerted in a computerized environment. Recently, the community welcomed several major publications on CPS assessments: PISA Technical Report (2017), von Davier (2017), von Davier, Zhu, and Kyllonen (2017), NCES White Paper (2017) and Griffin and Care (2015).
In this tutorial, we present the process of building CPS assessments in a computational psychometrics framework (von Davier, 2017b). The focus will be on the measurement challenges and several empirical examples will be used to exemplify the methodology. Specifically, we will first discuss the construct and introduce several frameworks that have been used for the CPS tests in the recent years (PISA, ATC21s, ACT Holistic Framework, ETS’ framework). Next, we will present the design space for the CPS assessments in the light of Evidence Centered Design (Mislevy, et al., 2003); we will also discuss the design of the data collection and of the log files associated with the CPS assessments. Next, we will present the computational psychometrics methodology that is promising for analyzing data from the CPS tests. We will discuss the use of stochastic processes to model the temporal structure of the dynamic interaction, the MIRT model for estimating the propensity of collaborative behaviors, and the use of machine learning approaches to investigate the relationship between the CPS subskills and the team performance. Empirical examples are provided from the ACT’s CPS game, The Circuit Runner (Stoeffler, et. al, 2017) and from ETS’ CPS Science Assessment Prototype.
The target population for this tutorial consists of psychometricians, test developers, researchers and students interested in CPS.
An intermediate level of knowledge of statistics for social sciences and IRT is needed. The participants should have WEKA installed on their laptops.
Participants will learn ethical principles and test standards governing test interpretation and the necessary psychometric principles and procedures assessing viability of test scores and comparisons.
Weiner (1989) cogently noted, psychologists must “(a) know what their tests can do and (b) act accordingly. … Acting accordingly¬—that is, expressing only opinions that are consonant with the current status of validity data—is the measure of his or her ethicality” (p. 829).
To follow Weiner’s advice, psychologists must possess and apply fundamental competencies in psychological measurement and the importance of these competencies cannot be overstated for ethical assessment and clinical practice (Dawes, 2005; McFall, 2000). Interpretation of tests and procedures must be informed by strong empirical evidence from different types of reliability, validity, and diagnostic utility studies; each of which addresses a different interpretation issue. Unfortunately, most test technical manuals and popular interpretation guides and textbooks neglect reporting and addressing some critically important psychometric research methods and results necessary to judge the adequacy of the different available test scores and comparisons used in interpretation. So that psychologists may ethically interpret test scores or procedures, this workshop delineates and highlights the varied psychometric research methods psychologists must consider to adequately assess the viability of the different scores and comparisons advocated. Specific research examples with popular tests and procedures are provided as exemplars. Internal consistency, short– and long–term temporal stability, interrater agreement, concurrent validity, predictive validity, incremental predictive validity, age/developmental changes, distinct group differences, theory consistent intervention effects, convergent & divergent validity, internal structure (EFA & CFA), and diagnostic efficiency/utility methods are among those presented and each answer different but relevant questions regarding interpretation of test scores and comparisons. Following this workshop participants will be better able to critically evaluate psychometric information provided in test manuals, textbooks, interpretation guidebooks, Mental Measurements Yearbook, and the extant literature.
Graduate Students, Practitioners/Service Providers, Teachers/Professors, Researchers, Consultants, Administrators
It would be useful for attendees to have had completed basic statistics and research methods courses and an introductory testing course.
Aims of this workshop will be introducing both a theoretical and practical introduction to Automatic Item Generation (AIG), which is an emerging research area and an innovative assessment approach for generating assessment items using state-of-the-art technology. This workshop is designed for those who wish to learn the background, benefits, innovations, and practical applications of the item template and test development process of AIG.
Modern researchers, psychometricians, item writers, and assessment service providers increasingly find themselves facing a new paradigm where the assessment item production process is no longer manual, but rather can be a massive production automatized by technology, that is, Automatic Item Generation (AIG). AIG is an emerging research area and an innovative assessment tool where cognitive and psychometric theories are integrated together into a comprehensive assessment development framework for the purpose of generating assessment items using state-of-the-art technology, especially in Information and Communication Technology (ICT) environments. The number of content areas and the number of applications of AIG are exploding. As such, this new reality raises important issues in effective item development.
This full day course is intended as both a theoretical and practical introduction to Automatic Item Generation (AIG), which is an emerging research area and an innovative assessment approach for generating assessment items using state-of-the-art technology. This course is designed for those who wish to learn the background, benefits, innovations, and practical applications of the item template and test development process of AIG. This workshop specifically integrates hands-on training on the AIG item template development to gain theoretical knowledge and practical experience on the process.
This course intended for anyone interested in researching and developing items, tests, and related-services using AIG techniques:
1. Students in graduate-level courses in psychological and/or educational measurement may find this course helpful for better understand several important phases in test and item development: AIG template development, AIG item delivery, AIG item validation, and test development with AIG items. Participants will be introduced to theoretical and psychometric implications of AIG.
2. Professionals those directly involved in developing test items and managing tests may find this course useful as source of expanding their present understanding of item and test development toward AIG, especially with technologically-enhanced and innovative item types. Participants will be also exposed to practical and policy implications of AIG for assessment services.
3. Engineers those who are designing and developing technology-enhanced assessment-related services may find this course useful as source of identifying cost/benefits, strategies of sustainable development/managements of AIG services, and tips on developing assessment applications using AIG.
It is assumed that participants have sound understanding of basic concepts of educational or psychological measurement, such as reliability, validity, test security, and the item development and/or item validation process. Although not required, a participant’s experience in this course will be enhanced by additional prior coursework or experience with other modeling techniques such as factor analysis, item response theory, structural equation modeling, and/or multilevel modeling. Laptop computer with recent version internet browsers (e.g., Chrome) is highly recommended.
The main aim is attendees learn how to design and conduct a mixed-methods research using cognitive interviewing for interpreting DIF/bias in cross-lingual and cultural testing projects.
The expansion of international testing projects in education, health, and quality of life fields make necessary to address how Differential Item Functioning (DIF) can undermine validity of comparative interpretations based on psychological assessments, tests and survey data. There is a wide consensus about how elusive to interpret and prevent DIF has become. New approaches to such a difficult problem like Cognitive Interviewing (CI) integrated with DIF techniques, can contribute to improve our understanding of DIF in international testing projects. CI is even mentioned in the most recent release of the ITC Guidelines for Adapting Test. The main aim of the workshop is to present a practical, comprehensive approach to conducting CI for interpreting DIF in multi-national tests and scales. Within a mixed-methods research framework, the course will address how to conduct a CI study in an international research context. Course attendees will learn how to plan an international CI study: designs, materials (multi-lingual interviews protocols, templates for analyses, etc.), interviewers training, cooperative data analysis, etc. We will also teach how to integrate and report qualitative findings from CI with quantitative results obtained by DIF techniques. Practical examples of mixed-methods DIF studies will be analyzed using data bases of international testing and surveys like the Programme for International Student Assessment (PISA), the European Social Survey (ESS), and so on. Finally, the general structure to build validity arguments of the equivalence level reached and its implications for comparative interpretations using DIF results will be taught.
The workshop will be useful for junior and senior researchers interested in designing mixed-methods studies aimed at integrating qualitative and quantitative methods to interpret and avoid sources of DIF. The workshop will be especially relevant for researchers that want to acquire skills and knowledge in cognitive interviewing, mixed methods research, and DIF methods.
There are not pre-requisites. Participants will need to bring their own laptops with SPSS software installed.
The goal of this workshop is to make Generalizability Theory more approachable with an overview of the fundamentals of G Theory and training in a free and easy-to-use new software application, G Wiz.
Important steps in the development of any test include determining how well the test measures that which it purports to do and taking steps to reduce unwanted sources of variance (error) in the measurement. Generalizability (G) theory is a simple-but-powerful method for understanding and optimizing sources of variance within tests but is, perhaps, underutilized due to a lack of accessible tools. This workshop will provide training in the fundamentals of G Theory as a refresher for the experienced and as preparation for the less experienced, then provide hands-on training using the G Wiz software for conducting G studies to evaluate wanted and unwanted sources of variance and for conducting Decision (D) studies for optimizing measurement designs that maximize desired sources of variance and minimize error.
The goal of this workshop is to make G Theory easy-to-use. Familiarity with G Theory will be helpful but is not required for participation. Knowledge of Classical Test Theory and ANOVA is assumed.
Participants should bring a Windows-compatible laptop. Software will be supplied to participants.
Participants will learn ethical principles and test standards governing test interpretation and the necessary psychometric principles and procedures assessing viability of test scores and comparisons.
In 2014 the sixth edition of the Standards for Educational and Psychological Tests were published by AERA, APA and NCME. The Standards have been cited extensively in the United States, but also have served as a model by ITC and other professional organizations in assessment concerned with improving the quality of testing across educational and psychological contexts. The workshop will focus on the 2014 Standards, but also address other relevant standards and guidelines (e.g., ITC Guidelines on Adapting tests, Test use, CBT and Internet testing, and Quality Control; SIOP Principles) in specific areas of practice including:
For each of the above topics the presenter will provide an overview of the relevant foundational issues involved, and then identify special issues that apply to the use of tests internationally. For example, in addressing validity the focus would be to review the general foundational issues and requirements which apply across contexts (schools, organizations, credentialing, psychological assessments) and identifying unique issues which arise in using assessments globally (across international environments – cultures, language, etc.).
The target audience for the course is testing professionals and graduate students, researchers and practitioners who are interested in strengthening their understanding of how the Standards can be used to improve the quality of their work. Prerequisite skills required by the course are a general familiarity with the ideas and vocabulary associated with simple measurement concepts.
While not mandatory, participants are encouraged to purchase a copy of the Standards for Educational and Psychological Testing (2014) in advance and bring it to the workshop (print and digital copies are available for purchase at http://www.aera.net/Publications/Books/Standards-for-Educational-Psychological-Testing-2014-Edition).
The goal of this workshop is to provide an opportunity for participants to acquire knowledge of equating and hands-on experience using equating programs for their operational and research purposes.
When there are multiple testing dates per year and maintaining test security becomes very crucial, alternate forms of a test are often required each year. Such alternate forms should be built to the same content and statistical specifications so that reported scores can be used interchangeably across forms. However, even under optimal circumstances, alternate forms almost always differ somewhat in difficulty, and a psychometric procedure called equating is used to adjust for such differences. This workshop will demonstrate how equating can be done using two computer programs, Equating Recipes (ER) and IRTEQ, which are freely available to the public.
A computer program ER is a set of open-source functions, written in C, to perform all equating methods discussed by Kolen and Brennan (2014), as well as some other methods. Furthermore, ER provides a way to estimate bootstrap standard errors for most equating methods. For this workshop, the R interface will be used to access the ER functions.Another equating program IRTEQ employs a user-friendly interface to perform IRT scaling and equating. IRTEQ provides five options for scaling and supports various unidimensional models for dichotomously- and polytomously-scored items. IRTEQ also implements equating for scale scores if a raw-to-scale conversion table is provided.
This workshop will provide an excellent opportunity for conference participants to acquire knowledge of equating as well as to learn about programs for conducting equating research studies. The workshop will include a brief introduction to equating and illustrate using ER and IRTEQ with real examples, followed by hands-on experience in using those programs. At the end of the workshop, attendees should have a deeper understanding of equating and be reasonably knowledgeable about using ER and IRTEQ.
Graduate students who want to learn about the fundamentals of equating should attend the workshop. Also, measurement professionals (e.g., psychometricians) who are responsible for operational equating could benefit from the workshop. For testing companies with their own internal equating systems, the workshop will consider an alternate route to validate their equating results using Equating Recipes (ER) and/or IRTEQ. Moreover, the workshop will benefit researchers who desires to conduct research studies and expand their knowledge about equating.
It will be helpful if participants have some knowledge of classical test theory, item response theory, and intermediate statistics.
For hands-on experience, participants are required to bring their laptops to the workshop. Prior to the workshop, attendees should download and install R (https://www.r-project.org/) as well as IRTEQ (http://www.umass.edu/remp/software/simcata/irteq/). Moreover, in order to conduct equating using R, participants should download and install an R package, which will be available by mid-May 2018 through the website for CASMA (https://education.uiowa.edu/centers/center-advanced-studies-measurement-and-assessment/computer-programs).
Attendees will learn how to apply Quality Control (QC) procedures for large- and small-scale assessments, methods for monitoring the quality of performance of assessment raters, and ways to prevent and detect various kinds of cheating.
Accuracy is essential in all stages of testing, beginning with test development and administration, through to scoring, test analysis, and score reporting. QC procedures are required in all these stages, especially in the short time-frame between test completion and score reporting. Failure to establish and implement such procedures can lead to inaccurate score calculation with potentially serious consequences, such as a qualified candidate not being accepted to a university or place of employment or a person lacking required qualifications being granted a professional license. It may also result in misguided educational intervention. According to ITC (2014), "Anyone involved in scoring, test analysis, and score reporting has a responsibility to maintain professional standards that can be justified to relevant stakeholders." Quality control procedures are extensively used in other professions, such as engineering, aviation, medicine and software development. Learning from the experience of those working in other fields can help assessment professionals design QC procedures for their specific purposes.
The workshop will deal with theoretical aspects of QC, providing examples of errors from real-life contexts. The main topics include: (1) QC procedures for large-scale assessments with large and stable cohorts – usually in paper & pencil mode; (2) QC procedures for scores on tests administered to small population groups on multiple administration dates (Continuous Administration Mode) – usually computer- and Internet-based; (3) methods for monitoring the quality of performance of assessment raters who conduct offline and online scoring; and (4) procedures to prevent and detect various kinds of cheating. Finally, studies on several aspects of QC will be presented. The workshop is based on two NCME instructional modules (Allalouf, 2007; Allalouf, Gutentag & Baumer, 2017) and on ITC Guidelines published in International Journal of Testing (2014).
The workshop is directed primarily at those who deal with: test scoring and equating, item and test analysis, training and supervision of performance assessment raters, test security and policy making.
Participants will gain familiarity with innovative approaches to designing items, task sets, and score reports in support of the fair and valid assessment of culturally and linguistically diverse populations.
The workshop will involve hands-on exercises, real-world examples, and the interactive use of templates and models to gain familiarity with the ITC guidelines Large-Scale Assessment of Linguistically Diverse Populations. The examples will be targeted to K-12, postsecondary, and workforce applications. Participants will be presented with examples of designs, design decisions, artifacts, and templates to consider when designing, developing, and interpreting scores for diverse test-taker populations.The examples will target item types and task sets of 21st century skills such as collaborative problem solving, interactive communication, and intercultural competence. The focus on assessments of 21st century skills is highlighted because their administration to diverse populations presents added challenges to conceptualization and design choices related to fairly and meaningfully capturing and communicating information about test-takers dispositions and proficiencies, and all of these are issues that take on additional importance in cases when students are from diverse backgrounds.
An overarching theoretical framework will be presented informed by both the ITC Guidelines and advances to the evidence-centered design approach (Mislevy, Steinberg, & Almond, 2003) that takes a sociocognitive perspective to assessment conceptualization, design, analysis, and interpretation (Mislevy, 2015; Oliveri, Lawless, & Misvely, under review). Participants will be guided to consider ways to integrate templates for their own situations; thereby accomplishing the aim of the workshop, for participants to gain strategies for designing, developing, and interpreting data from tests and score reports purposed for diverse audiences.
At a time when the assessed student populations are increasingly diverse (e.g., speak a different home language, are immigrants, refugees, or do not speak the language of the test), issues of how to design assessments and score reports that fairly capture and communicate students’ proficiencies and dispositions in meaningful ways is an important area of investigation for researchers and practitioners alike.
Researchers, test developers, assessment directors, students
Familiarity with the ECD model. Attendees are expected to bring their own laptops.
The main objective is to introduce equating and let the attendees get a conceptual and practical understanding of various equating methods by using different data sets within the R software.
The aim of test equating is to adjust the test scores on different test forms so that they can be comparable and used interchangeably. Equating has a central role in large testing programs and it constitutes an important step in the process of collecting, analyzing, and reporting test scores. Equating is important as it ensures a fair assessment regardless which time, place or background different test takers might have. This workshop has two main goals. The first goal is to introduce equating. Through a number of examples and practical exercises, attendees will get both a conceptual and practical understanding of various equating methods conducted under different data collection designs. The R software will be used throughout the session with special focus on the packages; equate, kequate, and SNSequate. The second goal is to provide the necessary tools to be able to perform different equating methods in practice by using available R packages for equating. The training session follows some of the chapters in the book Applying test equating methods using R which has been written by the instructors and was released in 2017 by Springer. The workshop will start by introducing traditional equating methods and different data collection designs and the attendees will be acquainted on how to perform these methods using the R packages equate. Next, the attendees will be guided through the five steps of kernel equating; presmoothing, estimating score probabilities, continuization, equating and calculating the standard error of equating, using the R packages kequate and SNSequate. The attendees will then be introduced to item response theory equating and will receive practical guidance on how to perform these methods using R.
Researchers in educational measurement and psychometrics, graduate students and practitioners and others with interest on how to conduct equating in practice.
An introductory statistical background and experience in R is recommended but not required. Attendees are expected to bring their own laptop with R installed together with the latest versions of the R packages equate, kequate, and SNSequate. Eletronic training materials will be provided to the attendees.
The workshop serves as an introduction to the latest standards in the cross-cultural adaptation of psychological and educational tests. The workshop focuses heavily on the a-priori (mainly qualitative) methods that are employed to develop appropriate adaptations, and discusses only in passing the various a-posteriori (mainly quantitative) methods, such as statistical approaches that may be employed to offer evidence for the equivalence of the adapted and the original forms of the test. Therefore, the "craft" of test adaptation is discussed in terms of adaptation designs, translation procedures (incl. decentering), piloting, identification of sources of non-equivalence in the preliminary data (e.g. statistical analyses which permit low-volume samples and qualitative approaches such as cognitive interviews). The workshop uses a number of case studies with which participants are required to interact.
The workshop will be useful for researchers, students and practitioners who translate and adapt tests either for their various research projects, or for their work with clients. The workshop will bring important insights both to those less experienced in the domain of test adaptation (e.g., researchers starting their first adaptation project) and to those more experienced, but who have struggled with the mainly qualitative and judgmental approach needed in the a-priori phases (before confirmatory statistics) of the adaptation in order to generate an adequate target-language or target-culture form of the test.