By: Jessie Kember
Assessment is the process of collecting data and information about specific knowledge and/or skills. In schools, an assessment can serve many purposes, including identifying students who might need intervention, improving instruction, or monitoring a student’s progress. Most importantly, assessment helps with decision-making, especially when using a Multi-Tier System of Support (MTSS). Within an MTSS, assessment provides data that can be used to identify which students need help and support, how much support they need, and whether the intervention and support implemented is effective.
Assessment is only useful if the data are interpreted accurately. As an educator, it is important that you know how to interpret data so that they can inform decisions and practice. An assessment tool is only as useful as the educator’s interpretation and use of the data that the assessment yields. In recent years teachers have been expected to administer and interpret many new assessments, but they have not always been provided with instruction about how to understand student data. Assessment literacy refers to the understanding of data in ways that allow applications to student instruction. Important aspects of data literacy include conceptual foundations, the types of assessment scores that are often used in schools, and score comparisons.
Effective assessment relies on two foundational concepts: reliability and validity. These are important when considering the quality of the assessment data because without them the data are not worth interpreting.
Reliability. Reliability is the extent to which an assessment tool measures a skill consistently. It is important to note that an assessment measure itself is not reliable. Instead, an assessment can have reliability evidence, based on the interpretation and use of the data. There are different types of reliability evidence, including test-retest, alternate form, inter-rater, internal consistency, and reliability of the slope. All of these types of reliability evidence are generated by comparing test performance scores over time or between items on the assessment. All indicators of reliability seek to document how well the assessment provides the same type of information each time it is used.
Validity. Validity is the extent to which an assessment tool measures what it is intended to measure. For example, to be valid, a reading test needs to measure reading skills and not math skills. Validity evidence is important because it tells whether conclusions reached from the data are appropriate, meaningful, and useful. There are many different types of validity evidence, including content, criterion, predictive, construct, face, convergent, and discriminant validity. All of these types of validity are ways of showing if the assessment measures the same thing as other tests of the same thing.
Types of Scores
In addition to understanding the quality of the data, it is important to understand how different types of assessment scores are interpreted. Interpreting FAST™ assessment data involves understanding the scores provided by the software. Having some background knowledge about these scores, and their accurate interpretation, can help you to make effective instructional and intervention decisions.
Raw Score. This is the primary score, and typically is expressed as the total number of items correct on the specific assessment. For some assessments, the raw score also tells the student’s rate by indicating the number of items correct per minute. Examples of FAST™ assessments that provide raw scores are CBMreading and CBMmath.
Composite Score. A FAST™ composite score consists of multiple subtest raw scores added together. Composite scores were developed to serve as the best predictors of achievement because they increase overall reliability. Sometimes, both individual subtest scores and composite scores can be interpreted. Examples of FAST™ assessments that use composite scores are earlyReading and earlyMath.
Scaled Score. Scaled scores are converted raw scores. Scaled scores take raw scores and convert them to be easier to understand by making them all fit along a range—or scale—of score values. For example, a raw score of 30 could be converted to a scaled score of 550. Examples of FAST™ assessments using scaled scores are aReading and aMath.
Standard Score. A standard score is a value that designates the number of standard deviations a score is above (positive value) or below (negative value) the average of a group of scores. FAST™ does not use standard scores, but they are often used in many other assessments. Most often, standard scores have an average of 100 and a standard deviation of 15, but some tests use standard scores with an average of 50 and standard deviation of 10.
Percentile. Percentiles are not technically scores, but are rankings of scores according to their order. A percentile rank shows how a specific score compares to all the others collected from similar types of students. FAST™ produces percentiles based on the scores from all the students in the same class, school, or district as well as for the national database. For example, for a set of scores from 100 students in the same grade, a student who performs in the 72nd percentile performed as well or better than 72 percent of the other students who took the same test.
In order for assessment scores to be useful, they need to be compared to some expectation or standard. Through such comparisons, teachers can tell if a student’s current performance is below, at, or above expectations. FAST™ uses two main types of score comparisons: norms and benchmarks.
Norms. Norms, or normative scores, are average scores of large groups of students. Norms reflect typical performance of students at each grade level. It is important to remember that FAST™ norms are updated annually to reflect the most recent data. The purpose of norms is to establish a baseline against which a student’s score can be compared. FAST™ reports include notation to help interpret student performance in the context of norms. For example, student scores are color-coded to show whether each student’s score is below, at, or above the norms. Each of these color codes is matched to a specific range of percentile ranks, as follows:
- Red = below 20th
- Orange = 21st – 30th
- Green = 31st – 85th
- Blue = above 85th
FAST™ assessment reports include norms for the class, grade, school district, and nation. Teachers can then compare student performance in relation to each of these groups.
Benchmarks. The other type of score comparison that FAST™ uses is a benchmark. Benchmarks are actually developed from norms but are specific to each school or district’s identified learning goals or standards. FAST™ provides benchmark scores by grade level and time of year (e.g., fall, winter, and spring) for each available assessment. For example, there will be different benchmarks for the fall, winter, and spring student assessments. Benchmark scores allow educators to identify which students are at risk for achievement difficulties. FAST sets default benchmark score levels so that all students’ scores are organized into three levels: high risk, some risk, and low risk. The high risk indicator defaults to the 15th percentile and the some risk is set at the 40th percentile. It is possible for schools to set their own custom benchmark levels.
Instead of color codes, FAST uses exclamation marks to show a student’s risk level. Student scores without exclamation marks are at low risk, those with one exclamation mark are at some risk, and those with two exclamation marks are at high risk of achievement difficulties. There is not a no-risk category because it is impossible to predict with 100% accuracy if a student will meet all learning targets and always be successful in school. When reviewing FAST™ score reports, teachers will know to take the most immediate steps to help those students with two exclamation marks. The sooner that intervention is provided for such students, the more likely they are to catch up to peers and reach the benchmark goal.
Having a basic level of assessment literacy helps teachers to understand their students’ assessment reports. Big ideas to remember about student assessment data are:
- Assessment data should always be interpreted with caution. Even though FAST™ produces automated scores and reports, it is important for users to consider the accuracy of the assessment administration, the purpose of the assessment, the content of the assessment, the student completing the assessment, and how the data will be used. FastBridge Learning endorses the use of a multi-method, multi-assessment model. This means that all FAST™ scores should be compared to other sources of information about a student’s school progress.
- Reliability and validity are features of assessments that should be considered when selecting and using them to evaluate student achievement. Reliability refers to how consistent the assessment is across students and time. Validity is how accurately the assessment measures the target skill area.
- There are many different types of assessment scores possible. The first step in understanding student data is to know what type(s) of scores are reported.
- Scores can be compared to both norms and benchmarks. Norms show the average scores obtained by many students who took the same test. Benchmarks are specific score goals that the school sets for all students to meet.
With time, assessment reports become easier to understand. When a school conducts universal screening of all students, teachers become very skilled at understanding the assessment reports. More information about FAST™ assessments and reports can be found online at www.fastbridge.org.
Jessie Kember is a doctoral student at the University of Minnesota – Twin Cities. She was a part of FastBridge Learning’s research team for four years and contributed to the development and refinement of various FAST™ reading assessments. Jessie is currently completing an internship in school psychology at a school district in Colorado.