Adaptive Testing
Frequently Asked Questions About How Online HSAs Adapt to Each Student Across Opportunities:
1) If we are measuring ability to achieve specific benchmarks, why is there a difference in difficulty attached to the items that are given to some students and not others?
The adaptive HSA Online system selects items for each student that most accurately align with his or her performance on the test to that point. In general, students who are doing well on the test will see more difficult items, and students who are struggling will see easier items. Regardless of the difficulty of the items, all students are tested on the breadth of the gradelevel content, and all students get an opportunity to demonstrate their higherorder thinking skills.
2) How is a student's achievement measured from one test to another if there is also the factor of difference in difficulty of items given to students at different times?
Each item has a measured difficulty, so the items can be arranged along a scale. Student scores lie along that same scale. Imagine two students, one getting difficult items and the other receiving easier items. Suppose they both get half of their items correct. The student with the more difficult items will get a higher score. This is made possible through a statistical process known as equating, and it is used on virtually all contemporary tests.
3) Since the HSA is given at 3 different times to students, those students who score high the first time are then given hard items the next time they're taking the tests. How do we answer the angst and tears when the students drop by many points? They have learned more, studied hard, and motivated themselves. This seems to be detrimental to these students' wellbeing and can cause even more anxiety and self doubts about test taking in the future.
At each opportunity, students see items aligned to their performance on that test. The initial item selection is based on performance by the student on earlier tests, but item difficulty quickly adjusts to current performance. As explained above, students receiving more difficult items get higher scores when they answer the same number of items correctly.
Some student scores drop, either because of distractions, bad testing day, or other reasons. The adaptive nature of the test does not lead to this phenomenon, and in fact, generally reduces it.
The Standard Error of Measurement (SEM) also needs to be considered when reviewing a student’s scores for each opportunity administered for the Reading, Mathematics, or Science Assessment.
The observed score on any test is an estimate of the true score. If a student took a similar test several times, the resulting scale score would vary across administrations, sometimes being a little higher, a little lower, or the same. The standard error of measurement (SEM) represents the precision of the scale score, or the range in which the student would likely score if a similar test was administered several times. The +/– next to the student’s scale score provides information about the certainty, or confidence, of the score’s interpretation. The boundaries of the score band are one standard error of measurement above and below the student’s observed score, representing a range of score values that is likely to contain the true score. For example, 310 ± 10 indicates that if a student was tested again, it is likely that in two out of three times the student’s true score would fall between 300 and 320.
Because students are administered different sets of items of varying item difficulties in each computeradaptive content area assessment, the standard error of measurement can be different for the same scale score, depending on how closely the administered items match the student’s ability.
Appropriate Uses
A student's scale score should be evaluated after the standard error of measurement is added to or subtracted from the scale score. This provides a score range that includes the student’s true score with 68 percent of certainty (i.e., across repeated administrations, the student’s test score would fall in this range about 68 percent of the time).
Inappropriate Uses
A small difference between scale scores (e.g., within 1 SEM) should not be interpreted as a significant difference. The measurement error should be taken into account when users are comparing scores. For example, students with scores of 301 and 303 are not reliably different because those scores differ by less than 1 SEM. The student’s true score can lie outside the score band. The score band contains a student’s true score with 68 percent certainty; therefore, the student’s true score can lie outside the score band.
4) How are the DOE and testing office informing teachers, parents, students, and the public about the way students are being tested, e.g., the adaptive construction of the tests?
Teachers, parents, students, and the public are encouraged to visit the nonsecure Online HSA website (alohahsa.org) where information about how the adaptive Online HSA system works is posted under the “Important Information” heading on the homepage.
5) What is the purpose of making the HSA adaptive?
An adaptive test gives a more precise estimate of ability for most students than a comparable fixed form test. This provides better instructional information, more accurate measures of growth, and a challenging but accessible testing experience for each student.
6) Is there a purpose in giving students the test 3 times if they pass the first time? It is taking valuable instructional time that could be used for collaborative projects, art, music, social studies etc....but instead continues to be used for "preparing to do better on the next test".
The Department of Education requires schools to administer the Online HSA Reading, Mathematics, and Science Assessments to the students in the identified grades only once.
7) If teachers are going to be evaluated on the progress of their students on the HSA test, how is the adaptive construct being factored in?
The adaptive design makes the Online HSA assessments among the fairest ways to measure student growth because the design can accurately measure students along a broader range of the proficiency continuum. This is not always the case with traditional tests that administer the same items to every student.
8) Is the testing office doing a study on the effect of testing students three times and what the value is in doing so?
The Student Assessment Section and its contractor, American Institutes for Research (AIR), routinely review test data of students’ first, second, and third opportunities and have found in general that students score higher on their second and third opportunities. The Student Assessment Section does not plan to do a formal study at the present time.
9) How does marking a question for review during the test affect the next question? Does it lower the level and value? Does it remain at the same difficulty level?
Marking a question for review does not in any way affect the selection of subsequent questions. It is simply a way for a student to make a note to himself or herself to review the initial answer for a question. Only a student’s initial response to a question (independent of whether the question is marked for review), which is used to update the student’s ability estimate, will affect the selection of subsequent questions. If a student changes the initial response to a marked or unmarked question, the change in response will result in an update of the student’s ability estimate, which will affect the selection of any additional questions.
Contact Us
Email: hsa@notes.k12.hi.us
U.S. Mail:
State of Hawaii Department of Education
Student Assessment Section
641 18th Avenue #V102
Honolulu, HI 96816
Telephone: (808) 7334100
Monday through Friday, 7:45 AM  4:30 PM HST
Fax: (808) 7334483
