المقارنة بين الطريقة الجدية والارتباطية في حساب تمييز الفقرة في ضوء تباين حجم العينة وطول الاختبار

بشائر بنت عبد الله بن محمد الغدير

doi:10.55074/meb39374

Authors

Bashaer Abdullah Mohammed ALGhadeer Assistant Professor of Educational Psychology, Measurement, and Evaluation Mustaqbal University in the Kingdom of Saudi Arabia

DOI:

https://doi.org/10.55074/meb39374

Keywords:

Item discrimination, Classical Test Theory, discrimination coefficient, Extreme Groups Method, Item-Total Correlation Method, sample size, test length, reliability, validity

Abstract

This study presents a theoretical comparison between two common methods for calculating item discrimination within classical measurement theory: the extreme groups method and the correlational method (item-to-group correlation). The research problem lies in the potential for discrepancies in discrimination estimates depending on the method used, and in the sensitivity of these estimates to variations in sample size and test length. This sensitivity can impact decisions regarding item retention or exclusion, as well as test quality. The study aims to develop a conceptual framework that clarifies the fundamental differences between the two methods, analyzes the impact of sample size and test length on the stability, accuracy, and potential biases of the estimation, and ultimately provides theoretical guidelines for selecting the most appropriate indicator within the measurement context. The study employs a theoretical and analytical approach based on an analysis of psychometric literature and a discussion of the assumptions governing discrimination indicators in light of sample and test characteristics. The study addresses the following key areas: the concept of item discrimination and its relationship to difficulty, reliability, and validity; a description of the two methods and their characteristics; and a comparison of the impact of n and test length on each method. The expected conclusions are that the traditional method is more direct in its explanation but less stable at small samples due to sample reduction and neglect of the mean, while the correlational method - especially the corrected one - tends to have greater consistency with internal consistency, with a notable influence of test length, sample homogeneity and dimension structure.

Downloads

Download data is not yet available.

References

القصابي، خليفة بن أحمد بن حميد. (2020). تحليل الفقرات في بناء المقاييس النفسية: الصدق الظاهري، صدق الفقرات، الصدق العاملي. المجلة الدولية للدراسات التربوية والنفسية، 8(3)، 541–555. https://doi.org/10.31559/EPS2020.8.3.1

علّام، صلاح الدين محمود. (2000). القياس والتقويم التربوي والنفسي. دار الفكر العربي للطباعة والنشر. القاهرة، مصر.

عودة، أحمد سليمان. (1985). القياس والتقويم في العملية التدريسية. دار الأمل للطباعة والنشر والتوزيع. عمان، الأردن.

ملحم، سامي محمد. (2002). القياس والتقويم في التربية وعلم النفس. دار المسيرة للطباعة والنشر والتوزيع. عمان، الأردن.

Allen, M. J., & Yen, W. M. (1979). Introduction to Measurement Theory. Brooks/Cole Publishing Company.

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Prentice Hall.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Bazaldua, D. A. L., Lee, Y.-S., Keller, B., & Fellers, L. (2017). Estimation bias in item discrimination. Asia Pacific Education Review, 18(4), 585–598. https://doi.org/10.1007/s12564-017-9502-1

Brennan, R. L. (1972). A generalized upper-lower item discrimination index. Educational and Psychological Measurement, 32(2), 289–303. https://doi.org/10.1177/001316447203200204

Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3), 296–322. https://doi.org/10.1111/j.2044-8295.1910.tb00207.x

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555

Downing, S. M., & Haladyna, T. M. (Eds.). (2006). Handbook of test development. Lawrence Erlbaum Associates.

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.

Gulliksen, H. (1950). Theory of mental tests. Wiley

Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Lawrence Erlbaum Associates.

Henrysson, S. (1963). Correction of item-total correlations in item analysis. Psychometrika, 28(2), 211–218. https://doi.org/10.1007/BF02289590

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. https://doi.org/10.1037/h0057123

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.

https://doi.org/10.1007/BF02288391

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3(3), 271–295. https://doi.org/10.1111/j.2044-8295.1910.tb00206.x

Thorndike, R. M., & Thorndike-Christ, T. (2010). Measurement and evaluation in psychology and education (8th ed.). Pearson.