![]() |
|
|
|
|
|
|
|
Office of Educational Accountability
|
Wisconsin Proficiency Score Standards Setting Final Summary Report (1997)Standards Setting for Elementary, Middle, and High School at Grades 4, 8, and 10 (November 1997) This Final Summary Report of the Proficiency Score Standards provides information to answer the "who, what, and how"questions about the standard setting activities that were held in Green Lake in April, 1997 for the WKCE. This report has been updated with the final, approved cut-scores and estimates of score distributions for 1996-97. The proficiency category, "partially proficient" has also been renamed as, "basic." This summary report is organized to answer the following questions:
Final results including descriptors of each of the four proficiency categories by subject and by grade and the scale score ranges associated with each category. Additional copies of this Final Summary Report of the Proficiency Score Standards and other WSAS program related publications are available upon request from the Office of Educational Accountability. Please contact:
Jean Whitcomb (608) 267-2937 Office of Educational Accountability FAX (608) 266-8770 I. WHAT IS THE WSAS PROGRAM?Section 118.30, Wisconsin Statutes requires the state superintendent to adopt or approve examinations designed to measure pupil attainment of knowledge and concepts. The Wisconsin Student Assessment System (WSAS) Knowledge & Concepts Examinations measure student knowledge in the areas of reading, language arts (including writing), mathematics, science, and social studies. The purposes of the WSAS testing program are to:
The results of the WSAS testing are used by the Department of Public Instruction as an accountability measure for school improvement in order to:
What do the WSAS Knowledge & Concepts Examinations look like? This new test series was based on the emerging national content standards, National Education for Education Progress (NAEP) frameworks, and state curriculum guides. The WSAS Knowledge and Concepts Examinations have two components. One component combines multiple-choice items with short-answer items and a second component requires the student to respond to a writing test. These examinations include:
Multiple-Choice and Short-Answer Examinations Multiple-choice items allow the student to select a response from among four choices. They are machine-scored. Short-answer items allow students to demonstrate their skills in more complex levels of thinking and are scored by a professional staff experienced in providing reliable and consistent hand-scoring services. Short-answer items allow partial credit for partially answered questions. Multiple-choice and short-answer items are thematically linked and are timed separately for all tests. The content areas included in the WSAS Knowledge and Concepts Examinations are briefly described below.
Reading/Language Arts:
Mathematics:
Science:
Social Studies: Each subject area test includes approximately 75 percent multiple-choice and 25 percent constructed response or short answer items. Each multiple-choice item requires the students to select the best possible answer among the several choices. The constructed-response, short-answer items are linked thematically with the multiple-choice items and give the students the chance to produce their own responses to show what they know and what they can do with their knowledge. The short answer questions are scored by professional staff and partial credit is allowed. Writing Essay Examination Each writing prompt begins with an introduction that explains the task, followed by a reading passage which provides a context for the writing assignment. The short reading passage includes excerpts from works of fiction and nonfiction by classic and contemporary writers that were selected for their grade-level appropriateness and literary value. Each student's on-demand essay is scored holistically on a six point scale. Holistic scales provide a reliable and valid basis for evaluating writing where trained evaluators assign a numerical score to each essay based on the overall merit of the student's writing based on the scoring criteria and anchor papers. The writing, on-demand essays, assess the areas of narrative, informative and persuasive writing which were spiraled (randomly assigned) in 1996-97 within classrooms at each grade. Future administrations of the test will include an essay from these writing genre or modes. Each writing sample is holistically scored at least twice, once each by two different evaluators. The rating scale ranges from 1 to 6, and if the two ratings are within one point of each other, they are averaged to obtain the final writing score. If the two ratings differ by more than one point, a third evaluator-a table leader or supervisor-also scores the writing and is responsible for assigning a final score. In each content area tested, the multiple choice and constructed response items are scaled together by the contractor using item response theory (IRT). For the purpose of the standard setting and proficiency reporting, the writing essay is scaled with the multiple choice/short answer Language Arts test to form a combined content area of Language plus Writing (Enhanced Language).
II. HOW WERE THE PROFICIENCY CUT SCORES SET?The standard setting procedure selected for use in Wisconsin was designed and conducted by the testing contractor, CTB/McGraw-Hill. The panelists placed "bookmarks" at the item in their item ordered booklets that represented the breaking point between the proficiency categories: minimal performance, basic, proficient, and advanced. The panelists were asked to define the proficiency score standard(s) for each subject area using the Bookmark Procedure (Lewis, Mitzel, & Green, 1996), developed by CTB research scientists. A copy of the Lewis, Mitzel, & Green paper is provided in the Technical Report .In April 1997, a representative group of Wisconsin educators and other citizens served on nine panels to study, debate, and recommend based on their collective expertise, proficiency score standards. The results of their deliberations are summarized in this report. The proficiency score standards will be used with subsequent administrations of the current series of WSAS examinations. The Proficiency Score Standard Setting Panels were comprised of 185 panelists from 100 school districts; they included:
Participants were drawn from recommendations by professional organizations, self-nominations, and other referral sources. Each nominee was asked to provide some indication of their area of qualification and experience. From the nomination pool, the department selected the most qualified panelists who also represented various areas of Wisconsin, brought a variety of experience, and were as well balanced as possible for gender, race/ethnicity, and other factors. Section V lists the districts and participant names of the panelists as well as official observers. The panelists were asked to attend for the entire two or three day standard setting session as required. Panelists' expenses were reimbursed by the contractor, however, the panelists' time was volunteered as a professional service. It should be acknowledged that many districts, other employers, and other organizations provided support to enable the panelists' release time. Panelists received background information in the mail prior to the panel meeting. The fundamental tasks that participants performed under the Bookmark Procedure were to:
The overall schedule followed by the standard setting panels was:
Under the Bookmark Procedure, participants placed their proficiency scores in a manner that directly reflects expected student performance. The Bookmark Procedure helps the panelist understand in an integrated way what the test measures, it makes the judgmental task concrete, and it best uses the professional experience of the participants. Evaluations showed that panelists were overwhelmingly positive about the process used and the resulting standards that were set. The strength of the process stems from three sources: First and fundamental was the use of a specially mapped test booklet containing two years' test questions and ordered from the easiest to the most difficult items providing empirical data for the judgment process. Second, the make up of the panels and the progression of deliberations in each round of rating beginning with individual-level judgments, to table-level discussion, to room-level discussion, to the provision of statewide context with "impact data," informed the panelists of others' views and experiences while maintaining the ratings at the individual level. The inclusion of adjacent grade-level teachers and the combined middle school/high school panel, allowed perspectives in the discussions that were across the various view points. Third, the table leaders, room leaders, and other staff facilitated discussions which, on the whole allowed all participants to be heard by the panel prior to placing their bookmarks. The goal of each round was to provide individual judgments and yet there was a high degree of consistency of ratings as evidenced by nearly identical mean and median bookmarks (documented in the Wisconsin Proficiency Score Standards Technical Report).
III. WHAT DO THE PROFICIENCY SCORES REPRESENT?Proficiency cut-scores are stated in terms of TerraNova scale scores, the basic underlying score on this assessment. Scale scores are based on the students' correct responses and also take into account the item difficulties, discrimination indices, and chance of guessing on selected response questions. The TerraNova scales for each content area range across the levels for which the test series was designed, from kindergarten-sixth month to the end of twelfth grade. The proficiency cut-scores are based on the actual knowledge and skills as represented by test items that panelists expected of the students by October of the particular grade. The content descriptors for each content area provide interpretations for those proficiency categories. Beginning in January 1998, student scores on the WSAS Knowledge & Concepts Examinations will be reported in terms of four proficiency categories in addition to normative and other scores. The proficiency categories are described in general terms as follows (see Table 2). Table 3 estimates the impact of the proficiency score standards based on the total statewide population reported for the fall of 1996 test. The estimates were revised on October 20th, 1997 to reflect the adjustments made to the cut scores for elementary social studies, middle level reading, and high school math.
The differences in proficiency cut-scores between the content areas as well as the percentages of students falling into each category are the result of several factors. Among the variables are: the content match of this particular test series with what is taught in local schools, the difficulty of the test content, and the make up of the panel. The standard setting process was designed to minimize all of the extraneous factors. The results of the standard setting process provide expectations of what proficient students should be able to demonstrate on this examination. The standard setting process was designed to minimize all of the extraneous factors that affect the panel judgments. The results of the standard setting process provide expectations of what proficient students should know and be able to demonstrate on this examination by the fall of the grade tested. This expectation is a content-based standard. IV. HOW IS EACH PROFICIENCY CATEGORY DESCRIBED?The Bookmark procedure is designed to simplify the judgmental task required of participants and facilitate the writing of valid proficiency category descriptions. The descriptions were obtained by examining the specific test questions that participants of the standard setting workshop determined all students in a given performance level should know and be able to do. The curriculum experts and other panelists were able to determine the knowledge, skills, and abilities required to correctly answer a multiple choice item or obtain a given score point on constructed response items. For example, to write the description of the student performance in the Proficient category, the committee members were asked to synthesize the content of items that fell within that category in the ordered item booklet. These items represent the content that most, but not all, students performing in the Proficient category know and are able to do. The same procedure was used for the other proficiency categories. Then, to streamline the descriptors, the committee members were asked to express the descriptors in the following fashion:
After the standard setting, the proficiency descriptors drafted by the committee members go through a series of revisions. First, some continuity between grades is established by the large group leaders with input from Wisconsin DPI. Following this revision, the descriptors are considered the long version; a shorter version is then developed for the purposes of the reports. The long version follows in order to provide meaning to the proficiency categories. Scale scores for the proficiency categories appear in parentheses.
The descriptors that follow provide meaning to proficiency score
categories for each content area and grade level tested by the
Knowledge & Concepts Examinations.
Links to proficiency score descriptors:
Table 4: Proficiency Categories in Terms of TerraNova Scale Scores Wisconsin Alternate Assessment for Students with Disabilities WAA-S/Dis. Pre-requisite Skill levels, based upon DPI-approved protocols and rubrics are assigned to students with disabilities who are not able to demonstrate some of the knowledge and skills on the WKCE test in the subject area with allowable accommodations. These skill levels provide common benchmarks for describing where a student is currently functioning with regard to developmental expectations for all students. PS Advanced - Student demonstrates a consistent understanding of the concepts and skills contained in the content area, but he or she is functioning at a level that is significantly below grade and/or developmental expectations. He or she requires minimal support to demonstrate his or her learning PS Proficient - Student demonstrates an emerging ability to understand and perform within the content area. Student's understanding of concepts and performance of most skills in the items is inconsistent and he or she requires moderate support to demonstrate his or her learning. PS Basic- Student attends to instruction in the content area and participates in activities with extensive support. Student responds or performs some skills in a limited number of settings. PS Minimal - Student exhibits very few of the prerequisite skills and knowledge in the content area. He or she is unable to perform skills or demonstrate knowledge without full physical prompting in a highly structured setting. V. HOW DO WISCONSIN STUDENTS COMPARE WITH OTHERS?On the fall 1996 Knowledge & Concepts Examinations, Wisconsin students at all three grades scored well above the nationwide comparison group on the multiple choice and short answer sections of the test. The following graph indicates how well Wisconsin students compare with the nationwide comparison sample (national norm group) on these tests. Using percentiles and comparing with a referent group is the way we've primarily looked at performance in the past. Statewide, Wisconsin students' scores generally rank higher than national averages on standardized achievement tests. But, what does that mean in terms of the knowledge and concepts that they know and can demonstrate? Is their performance on important skills good enough? Proficiency score reports are designed to answer these questions about some important content as measured by these achievement tests. In the future, the primary method of reporting and use of WSAS Knowledge & Concepts scores will be in terms of the proficiency score standards. The normative information that was provided in the past will continue to be provided. Table 5 provides a context for the percentages of Wisconsin students performing in each proficiency category found in Table 3. On Table 5, the performance by the national norm group is described using the Wisconsin Proficiency Category cut scores. The national comparison group does not display a "not tested" category since the inclusion practices varied somewhat across the sampled schools. Therefore, the percentages of students are based on the total students in the national sample. In general, fewer students in the national norm group scored in the Proficient and Advanced categories than did Wisconsin students.. VI. WHO WERE THE PARTICIPANTS IN THE APRIL 1997 STANDARD SETTING PANELS?Final: Nine panels, 185 panelists, 19-22 per panel (plus 4 canceled, 1 excused, 2 illnesses)
The following people were official observers, attending one or more days of panel proceedings:
Last updated on 9/19/2007 8:39:00 AM |
|
District Assessment Coordinator (DAC) Corner
WSAS Data |
|
![]() |
|
|
State Superintendent of Public Instruction Tony Evers
Department of Public Instruction, 125 S. Webster Street, P.O. Box 7841, Madison, WI 53707-7841 (800) 441-4563 DPI Home |