Workshops

Workshop schedule

The workshop list is confirmed. Please note that the descriptions and authors for the individual workshops may be subject to minor changes.

All workshops are on July 2nd, 2018. You can combine two half-day workshop for the price of a full day workshop.

Full day workshops (9am - 5pm with a lunch break at 12pm)
400$ (early bird) | 425$ (regular price)
FD1. Assessment of Collaborative Problem Solving Skills: An OverviewAlina Von Davier, Kristin Stoeffler
FD2. Ethics, Test Standards, and Test Interpretation: Measurement MattersGary L. Canivez
FD3. Introduction to Automatic Item Generation using CAFA AIGJaehwa Choi
FD4. Cognitive interviewing for interpreting DIF from a mixed methods perspectiveJose-Luis Padilla, Isabel Benítez
FD5. Generalizability theory: application and optimizationLisa A. Keller, Robert J. Cook, Frank V. Padellaro
FD6. Applying the Standards for Educational and Psychological Testing in International ContextsLinda Cook,  Wayne Camara, Joan Herman, Kadriye Ercikan
Morning half-day workshops (9am - 12:30pm, including a coffee break)
225$ (early bird) | 250$ (regular price)
AM1. Tools for EquatingWon-Chan Lee, Kyung (Chris) T. Han, Hyung Jin Kim
AM2. Quality Control Procedures for the Scoring and Rating of TestsAvi Allalouf
AM3. Test and Score Report Design Principles for Culturally and Linguistically Diverse PopulationsMaria Elena Oliveri, April Zenisky
Afternoon half-day workshops (2pm - 5pm)
225$ (early bird) | 250$ (regular price)
PM1. Applying Test Equating methods using RMarie Wiberg, Jorge González
PM2. Crafting adapted tests with a focus on a-priori methodsDragos Iliescu


Full-day workshops

FD1. Assessment of Collaborative Problem Solving Skills: An Overview

Presenters

Alina von Davier is the Vice President of ACTNext, by ACT, a Research, Development, and Business Innovation Division. Von Davier pioneers the development and application of computational psychometrics for learning and assessment systems.

Kristin Stoeffler is an Associate Assessment Designer/Editor/Test Development Editorial Associate at ACT and a PhD student in Educational Sciences at the University of Luxemburg

Workshop Objectives

The goal of the tutorial is for the participants to learn about the considerations for test development and the computational psychometrics methods for the collaborative problem solving (CPS) assessments.

Summary

Collaborative problem solving (CPS) skills are hard-to-measure competencies that are considered among the necessary 21st Century skills for academic and professional success. The challenges associated with the measurement of the CPS skills are multifaceted, and range from a lack of consensus around the construct definition to finding appropriate models for the interdependent data. Only in past seven years the measurement community took interest in developing assessments for the CPS skills. This evolution goes hand in hand with the technological advances that allowed the test development, administration, and data collection to be concerted in a computerized environment. Recently, the community welcomed several major publications on CPS assessments: PISA Technical Report (2017), von Davier (2017), von Davier, Zhu, and Kyllonen (2017), NCES White Paper (2017) and Griffin and Care (2015).

In this tutorial, we present the process of building CPS assessments in a computational psychometrics framework (von Davier, 2017b). The focus will be on the measurement challenges and several empirical examples will be used to exemplify the methodology. Specifically, we will first discuss the construct and introduce several frameworks that have been used for the CPS tests in the recent years (PISA, ATC21s, ACT Holistic Framework, ETS’ framework). Next, we will present the design space for the CPS assessments in the light of Evidence Centered Design (Mislevy, et al., 2003); we will also discuss the design of the data collection and of the log files associated with the CPS assessments. Next, we will present the computational psychometrics methodology that is promising for analyzing data from the CPS tests. We will discuss the use of stochastic processes to model the temporal structure of the dynamic interaction, the MIRT model for estimating the propensity of collaborative behaviors, and the use of machine learning approaches to investigate the relationship between the CPS subskills and the team performance. Empirical examples are provided from the ACT’s CPS game, The Circuit Runner (Stoeffler, et. al, 2017) and from ETS’ CPS Science Assessment Prototype.

Who should attend?

The target population for this tutorial consists of psychometricians, test developers, researchers and students interested in CPS.

Pre-requisites and logistical requirements

An intermediate level of knowledge of statistics for social sciences and IRT is needed. The participants should have WEKA installed on their laptops.

References

  • Griffin, P.E. & Care, E. (2015). Assessment and Teaching of 21st Century Skills. Amsterdam: Springer Verlag.
  • Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Research Report Series, 2003(1), i–29. doi:10.1002/j.2333-8504.2003.tb01908.
  • NCES (2017). Collaborative Problem Solving: Considerations for the National Assessment of Educational Progress. Washington, DC: NCES Retrieved from https://nces.ed.gov/nationsreportcard/pdf/researchcenter/collaborative_problem_solving.pdf
  • OECD (2017). PISA 2015 collaborative problem solving framework. Paris: France: OECD. Retrieved from https://www.oecd.org/pisa/pisaproducts/Draft%20PISA%202015%20Collaborative%20Problem%20Solving%20Framework%20.pdf
  • OECD (2017). PISA 2015 Technical Report. Paris: France: OECD. Retrieved from http://www.oecd.org/pisa/data/2015-technical-report/
  • Stoeffler, K., Rosen, Y., & von Davier, A. A. (2017). Exploring the measurement of collaborative problem solving using a human-agent educational game. Proceedings of the Seventh International Learning Analytics & Knowledge conference (LAK’17), 570-571, BC, Vancouver. http://dl.acm.org/citation.cfm?id=3029464
  • von Davier, A.A. (2017). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement.
  • von Davier, A.A., Zhu, M., & Kyllonen, P.C. (2017). Innovative assessment of collaboration. Cham, Switzerland: Springer Verlag.
  • von Davier, A. A. (2017). Computational psychometrics in support of collaborative assessments. In A.A. von Davier (Ed.). Measurement issues in collaborative learning and assessment. [Special Issue]. Journal of Educational Measurement.
  • WEKA 3: Data mining software in JAVA. Retrieved from http://www.cs.waikato.ac.nz/ml/weka/

Back to top of page


FD2. Ethics, Test Standards, and Test Interpretation: Measurement Matters

Presenter

Gary L. Canivez, Ph.D. is Professor of Psychology at Eastern Illinois University and Associate Editor for Archives of Scientific Psychology. The author of over 85 peer reviewed research and professional publications and over 200 professional presentations and continuing education/professional development workshops, Dr. Canivez has research interests in applied psychometrics in evaluating psychological and educational tests, including international applications. http://www.ux1.eiu.edu/~glcanivez

Workshop Objectives

Participants will learn ethical principles and test standards governing test interpretation and the necessary psychometric principles and procedures assessing viability of test scores and comparisons.

Summary

Weiner (1989) cogently noted, psychologists must “(a) know what their tests can do and (b) act accordingly. … Acting accordingly¬—that is, expressing only opinions that are consonant with the current status of validity data—is the measure of his or her ethicality” (p. 829).

To follow Weiner’s advice, psychologists must possess and apply fundamental competencies in psychological measurement and the importance of these competencies cannot be overstated for ethical assessment and clinical practice (Dawes, 2005; McFall, 2000). Interpretation of tests and procedures must be informed by strong empirical evidence from different types of reliability, validity, and diagnostic utility studies; each of which addresses a different interpretation issue. Unfortunately, most test technical manuals and popular interpretation guides and textbooks neglect reporting and addressing some critically important psychometric research methods and results necessary to judge the adequacy of the different available test scores and comparisons used in interpretation. So that psychologists may ethically interpret test scores or procedures, this workshop delineates and highlights the varied psychometric research methods psychologists must consider to adequately assess the viability of the different scores and comparisons advocated. Specific research examples with popular tests and procedures are provided as exemplars. Internal consistency, short– and long–term temporal stability, interrater agreement, concurrent validity, predictive validity, incremental predictive validity, age/developmental changes, distinct group differences, theory consistent intervention effects, convergent & divergent validity, internal structure (EFA & CFA), and diagnostic efficiency/utility methods are among those presented and each answer different but relevant questions regarding interpretation of test scores and comparisons. Following this workshop participants will be better able to critically evaluate psychometric information provided in test manuals, textbooks, interpretation guidebooks, Mental Measurements Yearbook, and the extant literature.

Who should attend?

Graduate Students, Practitioners/Service Providers, Teachers/Professors, Researchers, Consultants, Administrators

Pre-requisites

It would be useful for attendees to have had completed basic statistics and research methods courses and an introductory testing course.

Back to top of page


FD3. Introduction to Automatic Item Generation using CAFA AIG

Presenter

The instructor, Dr. Jaehwa Choi is Associate Professor and Program Director of the Assessment, Testing, and Measurement Program in the Department of Educational Leadership at the George Washington University. He has taught dozens of assessment or quantitative methods workshops (e.g., Automatic Item Generation, Assessment Engineering, Multilevel Analysis, SPSS Syntax, etc.) in the United States and abroad.

Workshop objectives

Aims of this workshop will be introducing both a theoretical and practical introduction to Automatic Item Generation (AIG), which is an emerging research area and an innovative assessment approach for generating assessment items using state-of-the-art technology. This workshop is designed for those who wish to learn the background, benefits, innovations, and practical applications of the item template and test development process of AIG.

Summary

Modern researchers, psychometricians, item writers, and assessment service providers increasingly find themselves facing a new paradigm where the assessment item production process is no longer manual, but rather can be a massive production automatized by technology, that is, Automatic Item Generation (AIG). AIG is an emerging research area and an innovative assessment tool where cognitive and psychometric theories are integrated together into a comprehensive assessment development framework for the purpose of generating assessment items using state-of-the-art technology, especially in Information and Communication Technology (ICT) environments. The number of content areas and the number of applications of AIG are exploding. As such, this new reality raises important issues in effective item development.

This full day course is intended as both a theoretical and practical introduction to Automatic Item Generation (AIG), which is an emerging research area and an innovative assessment approach for generating assessment items using state-of-the-art technology. This course is designed for those who wish to learn the background, benefits, innovations, and practical applications of the item template and test development process of AIG. This workshop specifically integrates hands-on training on the AIG item template development to gain theoretical knowledge and practical experience on the process.

Who should attend?

This course intended for anyone interested in researching and developing items, tests, and related-services using AIG techniques:

1. Students in graduate-level courses in psychological and/or educational measurement may find this course helpful for better understand several important phases in test and item development: AIG template development, AIG item delivery, AIG item validation, and test development with AIG items. Participants will be introduced to theoretical and psychometric implications of AIG.

2. Professionals those directly involved in developing test items and managing tests may find this course useful as source of expanding their present understanding of item and test development toward AIG, especially with technologically-enhanced and innovative item types. Participants will be also exposed to practical and policy implications of AIG for assessment services.

3. Engineers those who are designing and developing technology-enhanced assessment-related services may find this course useful as source of identifying cost/benefits, strategies of sustainable development/managements of AIG services, and tips on developing assessment applications using AIG.

Pre-requisites and logistical requirements

It is assumed that participants have sound understanding of basic concepts of educational or psychological measurement, such as reliability, validity, test security, and the item development and/or item validation process. Although not required, a participant’s experience in this course will be enhanced by additional prior coursework or experience with other modeling techniques such as factor analysis, item response theory, structural equation modeling, and/or multilevel modeling. Laptop computer with recent version internet browsers (e.g., Chrome) is highly recommended.

Back to top of page


FD4. Cognitive Interviewing for Interpreting DIF from a Mixed Methods Perspective

Presenter

Jose-Luis Padilla is an Associate Professor at the Department of Methodology of Behavioral Sciences at the University of Granada (Spain). His current research focuses on psychometrics, validity, and cross-cultural research within a mixed method framework combining quantitative and qualitative methods.

Isabel Benítez is a researcher at Universidad Loyola Andalucía. In the last years, mixed methods research has been one of her main interests, and her current projects are focused on applying this framework for resolving research questions in different fields.

Workshop Objectives

The main aim is attendees learn how to design and conduct a mixed-methods research using cognitive interviewing for interpreting DIF/bias in cross-lingual and cultural testing projects.

Summary

The expansion of international testing projects in education, health, and quality of life fields make necessary to address how Differential Item Functioning (DIF) can undermine validity of comparative interpretations based on psychological assessments, tests and survey data. There is a wide consensus about how elusive to interpret and prevent DIF has become. New approaches to such a difficult problem like Cognitive Interviewing (CI) integrated with DIF techniques, can contribute to improve our understanding of DIF in international testing projects. CI is even mentioned in the most recent release of the ITC Guidelines for Adapting Test. The main aim of the workshop is to present a practical, comprehensive approach to conducting CI for interpreting DIF in multi-national tests and scales. Within a mixed-methods research framework, the course will address how to conduct a CI study in an international research context. Course attendees will learn how to plan an international CI study: designs, materials (multi-lingual interviews protocols, templates for analyses, etc.), interviewers training, cooperative data analysis, etc. We will also teach how to integrate and report qualitative findings from CI with quantitative results obtained by DIF techniques. Practical examples of mixed-methods DIF studies will be analyzed using data bases of international testing and surveys like the Programme for International Student Assessment (PISA), the European Social Survey (ESS), and so on. Finally, the general structure to build validity arguments of the equivalence level reached and its implications for comparative interpretations using DIF results will be taught.

Who should attend?

The workshop will be useful for junior and senior researchers interested in designing mixed-methods studies aimed at integrating qualitative and quantitative methods to interpret and avoid sources of DIF. The workshop will be especially relevant for researchers that want to acquire skills and knowledge in cognitive interviewing, mixed methods research, and DIF methods.

Pre-requisites and logistical requirements

There are not pre-requisites. Participants will need to bring their own laptops with SPSS software installed.

Back to top of page


FD5. Generalizability theory: application and optimization

Presenters

Lisa A. Keller: Lisa, an associate professor at the University of Massachusetts in the Research, Evaluation Methods, and Pychometrics (REMP) program, is an accomplished instructor. She teaches many psychometric and statistical courses, including one in Generalizability Theory.

Robert J. Cook: Rob, a psychometrician at ACT, has a background in software development and has built and trained staff to use psychometric software tools at multiple testing organizations. He is the developer of G Wiz, the generalizability software demonstrated during this workshop.

Frank V. Padellaro: Frank, a current doctoral candidate at the University of Massachusetts in the REMP program, has written multiple papers on optimization of variance components in G Theory designs. He is a principal designer of the G Wiz application.

Workshop objectives

The goal of this workshop is to make Generalizability Theory more approachable with an overview of the fundamentals of G Theory and training in a free and easy-to-use new software application, G Wiz.

Summary

Important steps in the development of any test include determining how well the test measures that which it purports to do and taking steps to reduce unwanted sources of variance (error) in the measurement. Generalizability (G) theory is a simple-but-powerful method for understanding and optimizing sources of variance within tests but is, perhaps, underutilized due to a lack of accessible tools. This workshop will provide training in the fundamentals of G Theory as a refresher for the experienced and as preparation for the less experienced, then provide hands-on training using the G Wiz software for conducting G studies to evaluate wanted and unwanted sources of variance and for conducting Decision (D) studies for optimizing measurement designs that maximize desired sources of variance and minimize error.

Prerequisites and Logistical requirements

The goal of this workshop is to make G Theory easy-to-use. Familiarity with G Theory will be helpful but is not required for participation. Knowledge of Classical Test Theory and ANOVA is assumed.

Participants should bring a Windows-compatible laptop. Software will be supplied to participants.

Back to top of page


FD6. Applying the Standards for Educational and Psychological Testing in International Contexts

Presenters

Linda Cook, Formerly VP of assessment at ETS, President of NCME, VP of AERA Division D, and member of the Committee to Revise the 1999 Testing Standard. Recipient of the 2017 NCME Award for Career Contributions. Member of the management committee to revise the 2014 Testing Standards.

Wayne J. Camara, Horace Mann Research Chair at ACT, former SVP of research at ACT and College Board, Chair of Testing Standards Management Cmte (2014) and Staff Director of 1999 Testing Standards. Fellow of AERA, APA, APS, NY Academy of Sciences and SIOP. Past President, NCME, ATP, and APA Division 5; Past VP of AERA Division D.

Joan Herman, Director emerita, National Center for Research on Evaluation, Standards and Student Testing. Former President of California Educational Research Association, Member- at-Large of the American Educational Research Association (AERA), Chair of Knowledge Alliance and member of the Committee to Revise the 1999 Testing Standards. A fellow of AERA and member of the National Academy of Education, Joan Herman currently chairs the Standards Management Committee.

Kadriye Ercikan, Vice President of Psychometrics, Statistics and Data Sciences at ETS and Professor of Education, at the University of British Columbia. She is the current Vice President of American Educational Research Association, a member of the AERA Executive Board of Directors, a member of the ITC Executive Council, has been a member of NCME Board of Directors. She is the recipient of the Significant Contributions Award from AERA Division D.

Workshop Objectives

Participants will learn ethical principles and test standards governing test interpretation and the necessary psychometric principles and procedures assessing viability of test scores and comparisons.

Summary

In 2014 the sixth edition of the Standards for Educational and Psychological Tests were published by AERA, APA and NCME. The Standards have been cited extensively in the United States, but also have served as a model by ITC and other professional organizations in assessment concerned with improving the quality of testing across educational and psychological contexts. The workshop will focus on the 2014 Standards, but also address other relevant standards and guidelines (e.g., ITC Guidelines on Adapting tests, Test use, CBT and Internet testing, and Quality Control; SIOP Principles) in specific areas of practice including:

  1. Validity
  2. Reliability and score precision
  3. Fairness
  4. Design and Development
  5. Test Translation
  6. Administration and Reporting
  7. Test User Responsibilities

For each of the above topics the presenter will provide an overview of the relevant foundational issues involved, and then identify special issues that apply to the use of tests internationally. For example, in addressing validity the focus would be to review the general foundational issues and requirements which apply across contexts (schools, organizations, credentialing, psychological assessments) and identifying unique issues which arise in using assessments globally (across international environments – cultures, language, etc.).

Who should attend?

The target audience for the course is testing professionals and graduate students, researchers and practitioners who are interested in strengthening their understanding of how the Standards can be used to improve the quality of their work. Prerequisite skills required by the course are a general familiarity with the ideas and vocabulary associated with simple measurement concepts.

Pre-requisites

While not mandatory, participants are encouraged to purchase a copy of the Standards for Educational and Psychological Testing (2014) in advance and bring it to the workshop (print and digital copies are available for purchase at http://www.aera.net/Publications/Books/Standards-for-Educational-Psychological-Testing-2014-Edition).

Back to top of page


Half-day workshops (AM)

AM1. Tools for equating

Presenters

Won-Chan Lee. Dr. Lee is Director of Center for Advanced Studies in Measurement and Assessment (CASMA) and Professor in the Educational Measurement and Statistics Program at University of Iowa. He specializes in equating, scaling, linking, and test theories including classical test theory, item response theory, and generalizability theory.

Kyung (Chris) T. Han. Dr. Han is Senior Director of Psychometric Research at GMAC®. His research expertise involves developing innovative approaches to improve measurement efficiency, designing CAT systems, and investigating the validity of score interpretations and decisions.

Hyung Jin Kim. Dr. Kim is Associate Research Scientist at CASMA. Her research interests include equating, generalizability theory, and multistage tests. She has extensive experience with a large-scale operational equating system.

Workshop Objectives

The goal of this workshop is to provide an opportunity for participants to acquire knowledge of equating and hands-on experience using equating programs for their operational and research purposes.

Summary

When there are multiple testing dates per year and maintaining test security becomes very crucial, alternate forms of a test are often required each year. Such alternate forms should be built to the same content and statistical specifications so that reported scores can be used interchangeably across forms. However, even under optimal circumstances, alternate forms almost always differ somewhat in difficulty, and a psychometric procedure called equating is used to adjust for such differences. This workshop will demonstrate how equating can be done using two computer programs, Equating Recipes (ER) and IRTEQ, which are freely available to the public.

A computer program ER is a set of open-source functions, written in C, to perform all equating methods discussed by Kolen and Brennan (2014), as well as some other methods. Furthermore, ER provides a way to estimate bootstrap standard errors for most equating methods. For this workshop, the R interface will be used to access the ER functions.

Another equating program IRTEQ employs a user-friendly interface to perform IRT scaling and equating. IRTEQ provides five options for scaling and supports various unidimensional models for dichotomously- and polytomously-scored items. IRTEQ also implements equating for scale scores if a raw-to-scale conversion table is provided.

This workshop will provide an excellent opportunity for conference participants to acquire knowledge of equating as well as to learn about programs for conducting equating research studies. The workshop will include a brief introduction to equating and illustrate using ER and IRTEQ with real examples, followed by hands-on experience in using those programs. At the end of the workshop, attendees should have a deeper understanding of equating and be reasonably knowledgeable about using ER and IRTEQ.

Who should attend?

Graduate students who want to learn about the fundamentals of equating should attend the workshop. Also, measurement professionals (e.g., psychometricians) who are responsible for operational equating could benefit from the workshop. For testing companies with their own internal equating systems, the workshop will consider an alternate route to validate their equating results using Equating Recipes (ER) and/or IRTEQ. Moreover, the workshop will benefit researchers who desires to conduct research studies and expand their knowledge about equating.

Pre-requisites and logistical requirements

It will be helpful if participants have some knowledge of classical test theory, item response theory, and intermediate statistics.

For hands-on experience, participants are required to bring their laptops to the workshop. Prior to the workshop, attendees should download and install R (https://www.r-project.org/) as well as IRTEQ (http://www.umass.edu/remp/software/simcata/irteq/). Moreover, in order to conduct equating using R, participants should download and install an R package, which will be available by mid-May 2018 through the website for CASMA (https://education.uiowa.edu/centers/center-advanced-studies-measurement-and-assessment/computer-programs).

Back to top of page


AM2. Quality Control Procedures for the Scoring and Rating of Tests

Presenter

Dr. Avi Allalouf is the director of Scoring & Equating at NITE. His areas of research are test adaptation and translation, differential item functioning (DIF), test scoring and equating, establishing procedures for essay scoring by professional raters and the effect of testing on society.

Workshop Objectives

Attendees will learn how to apply Quality Control (QC) procedures for large- and small-scale assessments, methods for monitoring the quality of performance of assessment raters, and ways to prevent and detect various kinds of cheating.

Summary

Accuracy is essential in all stages of testing, beginning with test development and administration, through to scoring, test analysis, and score reporting. QC procedures are required in all these stages, especially in the short time-frame between test completion and score reporting. Failure to establish and implement such procedures can lead to inaccurate score calculation with potentially serious consequences, such as a qualified candidate not being accepted to a university or place of employment or a person lacking required qualifications being granted a professional license. It may also result in misguided educational intervention. According to ITC (2014), "Anyone involved in scoring, test analysis, and score reporting has a responsibility to maintain professional standards that can be justified to relevant stakeholders." Quality control procedures are extensively used in other professions, such as engineering, aviation, medicine and software development. Learning from the experience of those working in other fields can help assessment professionals design QC procedures for their specific purposes.

The workshop will deal with theoretical aspects of QC, providing examples of errors from real-life contexts. The main topics include: (1) QC procedures for large-scale assessments with large and stable cohorts – usually in paper & pencil mode; (2) QC procedures for scores on tests administered to small population groups on multiple administration dates (Continuous Administration Mode) – usually computer- and Internet-based; (3) methods for monitoring the quality of performance of assessment raters who conduct offline and online scoring; and (4) procedures to prevent and detect various kinds of cheating. Finally, studies on several aspects of QC will be presented. The workshop is based on two NCME instructional modules (Allalouf, 2007; Allalouf, Gutentag & Baumer, 2017) and on ITC Guidelines published in International Journal of Testing (2014).

Who should attend?

The workshop is directed primarily at those who deal with: test scoring and equating, item and test analysis, training and supervision of performance assessment raters, test security and policy making.

Pre-requisites

None

Back to top of page


AM3. Test and Score Report Design Principles for Culturally and Linguistically Diverse Populations

Presenters

Maria Elena Oliveri is an ETS Research Scientist focusing on assessment design and score interpretation in academic to career readiness. She is the Associate Editor of International Journal of Testing and co-author of the ITC Guidelines for the Large-Scale Assessment of Linguistically Diverse Populations.

April Zenisky is Research Associate Professor of Education and Director of Computer-Based Testing Initiatives in the Center for Educational Assessment at UMass-Amherst. Her primary research interests are score reporting, technology-based item types, and computerized testing.

Workshop Objectives

Participants will gain familiarity with innovative approaches to designing items, task sets, and score reports in support of the fair and valid assessment of culturally and linguistically diverse populations.

Summary

The workshop will involve hands-on exercises, real-world examples, and the interactive use of templates and models to gain familiarity with the ITC guidelines Large-Scale Assessment of Linguistically Diverse Populations. The examples will be targeted to K-12, postsecondary, and workforce applications. Participants will be presented with examples of designs, design decisions, artifacts, and templates to consider when designing, developing, and interpreting scores for diverse test-taker populations.

The examples will target item types and task sets of 21st century skills such as collaborative problem solving, interactive communication, and intercultural competence. The focus on assessments of 21st century skills is highlighted because their administration to diverse populations presents added challenges to conceptualization and design choices related to fairly and meaningfully capturing and communicating information about test-takers dispositions and proficiencies, and all of these are issues that take on additional importance in cases when students are from diverse backgrounds.

An overarching theoretical framework will be presented informed by both the ITC Guidelines and advances to the evidence-centered design approach (Mislevy, Steinberg, & Almond, 2003) that takes a sociocognitive perspective to assessment conceptualization, design, analysis, and interpretation (Mislevy, 2015; Oliveri, Lawless, & Misvely, under review). Participants will be guided to consider ways to integrate templates for their own situations; thereby accomplishing the aim of the workshop, for participants to gain strategies for designing, developing, and interpreting data from tests and score reports purposed for diverse audiences.

At a time when the assessed student populations are increasingly diverse (e.g., speak a different home language, are immigrants, refugees, or do not speak the language of the test), issues of how to design assessments and score reports that fairly capture and communicate students’ proficiencies and dispositions in meaningful ways is an important area of investigation for researchers and practitioners alike.

Who should attend?

Researchers, test developers, assessment directors, students

Pre-requisites

Familiarity with the ECD model. Attendees are expected to bring their own laptops.

Back to top of page


Half-day workshops (PM)

PM1. Applying Test Equating methods using R

Presenters

Marie Wiberg is a professor at the Dept. of Statistics, USBE, Umeå University, Sweden. She has worked with a number of different achievement tests. Wiberg has published many innovative papers on test equating and she initiated the development of the R package kequate.

Jorge González is an associate professor at the Dept. of Statistics, Pontificia Universidad Católica de Chile and a permanent consultant at a measurement center in Chile. González has published many papers on test equating and he is the developer of the R package SNSequate.

Workshop Objectives

The main objective is to introduce equating and let the attendees get a conceptual and practical understanding of various equating methods by using different data sets within the R software.

Summary

The aim of test equating is to adjust the test scores on different test forms so that they can be comparable and used interchangeably. Equating has a central role in large testing programs and it constitutes an important step in the process of collecting, analyzing, and reporting test scores. Equating is important as it ensures a fair assessment regardless which time, place or background different test takers might have. This workshop has two main goals. The first goal is to introduce equating. Through a number of examples and practical exercises, attendees will get both a conceptual and practical understanding of various equating methods conducted under different data collection designs. The R software will be used throughout the session with special focus on the packages; equate, kequate, and SNSequate. The second goal is to provide the necessary tools to be able to perform different equating methods in practice by using available R packages for equating. The training session follows some of the chapters in the book Applying test equating methods using R which has been written by the instructors and was released in 2017 by Springer. The workshop will start by introducing traditional equating methods and different data collection designs and the attendees will be acquainted on how to perform these methods using the R packages equate. Next, the attendees will be guided through the five steps of kernel equating; presmoothing, estimating score probabilities, continuization, equating and calculating the standard error of equating, using the R packages kequate and SNSequate. The attendees will then be introduced to item response theory equating and will receive practical guidance on how to perform these methods using R.

Who should attend?

Researchers in educational measurement and psychometrics, graduate students and practitioners and others with interest on how to conduct equating in practice.

Pre-requisites and logistical requirements

An introductory statistical background and experience in R is recommended but not required. Attendees are expected to bring their own laptop with R installed together with the latest versions of the R packages equate, kequate, and SNSequate. Eletronic training materials will be provided to the attendees.

Back to top of page


PM2. Crafting adapted tests with a focus on a-priori methods

Presenter

Dragoș Iliescu is a Professor of Psychology with the University of Bucharest. His research interests group around two domains: psychological and educational assessment, tests and testing (with an important cross-cultural component), and applied psychology. He is the current President (2016-2018) of the International Test Commission (ITC) and the author of “Adapting Tests in Linguistic and Cultural Situations” (Cambridge University Press, 2017).

Workshop Objectives

  • Understand the principles behind test adaptation and interpret them in practical terms relevant to psychological practice and research;
  • Understand how judgmental and empirical procedures contribute to test adaptation, and to be able to apply some of these techniques in their own practice;
  • Appreciate the complexities of test adaptation in the widest sense and start formulating their own models for best practice.

Summary

The workshop serves as an introduction to the latest standards in the cross-cultural adaptation of psychological and educational tests. The workshop focuses heavily on the a-priori (mainly qualitative) methods that are employed to develop appropriate adaptations, and discusses only in passing the various a-posteriori (mainly quantitative) methods, such as statistical approaches that may be employed to offer evidence for the equivalence of the adapted and the original forms of the test. Therefore, the "craft" of test adaptation is discussed in terms of adaptation designs, translation procedures (incl. decentering), piloting, identification of sources of non-equivalence in the preliminary data (e.g. statistical analyses which permit low-volume samples and qualitative approaches such as cognitive interviews). The workshop uses a number of case studies with which participants are required to interact.

Who should attend?

The workshop will be useful for researchers, students and practitioners who translate and adapt tests either for their various research projects, or for their work with clients. The workshop will bring important insights both to those less experienced in the domain of test adaptation (e.g., researchers starting their first adaptation project) and to those more experienced, but who have struggled with the mainly qualitative and judgmental approach needed in the a-priori phases (before confirmatory statistics) of the adaptation in order to generate an adequate target-language or target-culture form of the test.

Pre-requisites and logistical requirements

None.

Back to top of page