Expanding the Notion of Assessment Shell: From Task Development Tool to Instrument for Guiding the Process of Science Assessment Development

Authors

  • Guillermo Solano-Flores School of Education University of Colorado, Boulder
  • Richard J. Shavelson Stanford University
  • Steven A. Schneider West Ed

Keywords:

Science education, test contruction, performance assessment.

Abstract

We discuss the limitations and possibilities of shells (blueprints with directions for test developers intended to reduce test development costs and time). Although shells cannot be expected to generate statistically exchangeable exercises, they can generate exercises with similar structures and appearances when they are highly specific and test developers are properly trained to use them. Based on our research and experience developing a wide variety of assessments, we discuss the advantages of conceiving shells as: (a) tools for effective development of constructed-response items, (b) formal specifications of the structural properties of items; (c) task-authoring environments that help test developers standardize and simplify user (examinee) interfaces; and (d) conceptual tools that guide the process of assessment development by enabling test developers to work systematically. We also caution against possible misuses of shells.

Downloads

Download data is not yet available.

References

Aschbacher, P. R. (1991). Performance assessment: State activity, interest, and concerns. Applied Measurement in Education, 4 (4), 275-288.

Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of procedure-based scoring for hands-on science assessment. Journal of Educational Measurement, 29 (1), 1-17.

Bejar, I.I. (1995). From adaptive testing to automated scoring of architectural simulations. In E. L. Mancall & P.G. Bashook (Eds.), Assessing clinical reasoning: The oral examination and alternative methods (pp. 115-127). Evanston: American Board of Medical Specialties.

Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press. Crocker, L. (1997). Assessing content representativeness of performance assessment exercises. Applied Measurement in Education, 10 (1), 83-95.

Cronbach, L. J., Linn, R. L., Brennan, R. L., & Haertel, E. H. (1997). Generalizability analysis for performance assessments of student achievement or school effectiveness. Educational and Psychological Measurement, 57, 373-399.

Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137-163. Downing, S. M. & Haladyna, T. M. (1997). Test item development: Validity evidence form quality assurance procedures. Applied Measurement in Education, 10 (1), 61-82.

Fitzpatrick, R. & Morrison, E.J. (1971). Performance and product evaluation. In R. L. Thorndike (Ed.), Educational Measurement (pp. 237-270). Washington: American Council on Education.

Gelman, R. & Greeno, J. G. (1989). On the nature of competence: Principles for understanding in a domain. In L. B. Resnick (Ed.), Knowing, learning, and instruction. Essays in honor of Robert Glaser (pp. 125-186). Hillsdale: Lawrence Erlbaum Associates, Publishers.

General Accounting Office (1993). Student testing: Current extent and expenditures, with cost estimates for a national examination. Washington: Author.

Guttman, L. (1969). Integration of test design and analysis. In Proceedings of the 1969 invitational conference on testing problems (pp. 53-65). Princeton: Educational Testing Service.

Haertel, E. H. & Linn, R. L. (1996). Comparability. In Phillips, G. W. (Ed.), Technical issues in large-scale performance assessment (pp. 59-78). Washington: National Center for Education Statistics, U. S. Department of Education, Office of Educational Research and Improvement.

Haladyna, T. M. & Shindoll, R. R. (1989). Shells: A method for writing effective multiple-choice test items. Evaluation and the Health Professions, 12, 97-104.

Hively, W., Patterson, H. L., & Page, S. H. (1968). A "universe-defined" system of arithmetic achievement tests. Journal of Educational Measurement, 5 (4), 275-290.

Hodson, D. (1992). Assessment of practical work: Some considerations in philosophy of science. Science & Education, 1, 115-144.

Katz, I. R. (1998). A software tool for rapidly prototyping new forms of computer-based assessments (GRE Board Professional Report No. 91-06aP). Princeton: Educational Testing Service.

Klein, S. P., Shavelson, R. J., Stecher, B. M., McCaffrey, D., & Haertel, E. (1997). The effect of using shells to develop hands-on tasks. Unpublished manuscript. Santa Monica: The RAND Corporation.

Lindquist, E. F. (Ed.). (1951). Educational measurement. Washington: American Council of Education.

LOGALä (1996). Science Explorer 3.0 [Computer program]. Cambridge, MA: LOGAL Software, Inc.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23 (2), 13-23.

National Board for Professional Teaching Standards (1996). Adolescence and Young Adulthood/Science Standards for National Board Certification. Detroit: Author.

National Research Council (1996). National Science Education Standards. Washington: National Academy Press.

Nuttall, D. L. (1992). Performance assessment: The message from England. Educational Leadership, 49 (8), 54-57.

O'Neil, J. (1992). Putting performance assessment to the test. Educational Leadership, 49 (8), 14-19.

Roid, G. H., & Haladyna, T. M. (1982). A technology for test-item writing. New York: Academic Press.

Schneider, S., Daehler, K. R., Hershbell, K., McCarthy, J., Shaw, J., & Solano-Flores, G. (en prensa). Developing a national science assessment for teacher certification: Practical lessons learned. In L. Ingvarson (Ed.), Assessing teachers for professional certification: The first ten years of the National Board for Professional Teaching Standards. Greenewich: JAI Press, Inc.

Shavelson, R. J., Baxter, G. P., & Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30 (3), 215-232.

Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21 (4), 22-27.

Shavelson, R. J., Ruiz-Primo, M. A. & Wiley, E. (1999). Notes on sources of sampling variability in science performance assessments. Journal of Educational Measurement, 36 (1), 61-71.

Solano-Flores, G. (1993). Item structural properties as predictors of item difficulty and item association. Educational and Psychological Measurement, 53 (1), 19-31.

Solano-Flores, G., Jovanovic, J, Shavelson, R. J., & Bachman, M. (1994, April). Development of an item shell for the generation of performance assessments in physics. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans.

Solano-Flores, G., Jovanovic, J., Shavelson, R. J., & Bachman, M. (1999) On the development and evaluation of a shell for generating science performance assessments. International Journal of Science Education, 21 (3), 293-315.

Solano-Flores, G., Raymond, B., Schneider, S. A., & Timms, M. (1999). Management of scoring sessions in alternative assessment: The computer-assisted scoring approach. Computers & Education, 33, 47-63.

Solano-Flores, G., & Shavelson, R. J. (1997). Development of performance assessments in science: Conceptual, practical and logistical issues. Educational Measurement: Issues and Practice, 16 (3), 16-25.

Stecher, B. M., & Klein, S. P. (1997). The cost of science performance assessments in large-scale testing programs. Educational Evaluation and Policy Analysis, 11 (1), 1-14.

Stecher, B. M., Klein, S. P., Solano-Flores, G., McCaffrey, D., Robbyn, A., Shavelson, R. J., & Haertel, E. (en prensa). The effects of content, format, and inquiry level on performance on science performance assessment scores. Educational Evaluation and Policy Analysis.

Thorndike, R. L. (Ed.). (1971). Educational Measurement (2nd. Ed.). Washington: American Council on Education.

Wigdor, A. K., & Green, B. F., Jr. (Eds.). (1991). Performance assessment for the workplace (Vol. 1). Washington: National Academy Press.

Downloads

Article abstract page views: 1394

Published

2001-05-01

Similar Articles