Confirmatory Factor Analysis of the School-Based Assessment Evaluation among Teachers

The school-based assessment (SBA) system is a holistic assessment system that is conducted in schools by subject teachers in assessing the students cognitive (intellect), affective (emotional and spiritual) and psychomotor (physical) aspects. In order to evaluate the implementation of SBA, a measurement scale was validated. The aim of this paper was to explore different factor structures of the SBA evaluation scale by using the second-order Confirmatory Factor Analysis. Questionnaire was used as an instrument for data collection. 776 primary and secondary school teachers were selected as respondents using stratified random sampling. Results indicated that the SBA evaluation model was a valid and reliable scale. The input measurement model was validated with two factors (‘personnel qualifications’ and ‘physical infrastructure’), process measurement model was validated with six factors (‘attitude’, ‘understanding’, ‘skills’, ‘challenges’, ‘moderation’ and ‘monitoring’) and product measurement model was validated with two factors (‘students’ attitude’ and ‘students’ motivation’). This study provides support for using a valid instrument in evaluating the implementation of SBA in schools. Furthermore, CFA procedures used supported the conceptual framework set out earlier. Thus, it presents clearly the importance of evaluation of any system to follow all the dimensions outlined in the evaluation model by Stufflebeam.


Introduction 1.
According to the aspiration of the Malaysia National Education Philosophy, education in Malaysia is supposed to be an on-going effort towards further developing the potentials of individuals in a holistic and integrated manner so that wellbalanced individuals can be produced.In order to put on such an effort, reform has to be formulated in our education system.Recently, the assessment system in Malaysian education has been reviewed and the inference is that previously, the assessment was only focusing on summative type where public examinations were implemented to all students in Year six, nine and eleven (Ong, 2010).In the recent years, formative assessment has been introduced to certain subjects at certain level of schooling in all government schools.
A traditional concept of assessment has been found to be less effective in improving students' learning.This is because it focuses more on public examination which has rendered the students to become examination-oriented (Wiliam, 2001) and the assessment only evaluates the students purely on their academic achievement based on knowledge and skills in a very time-limited situation (Fan, 2011).It is also seen as negatively affecting students' emotion and confidence levels (Stiggins, 2005) hence, it produced more passive students and teachers (Mercurio, 2008).Similarly, the Malaysian public examination is a method that orientates the public to focus on the examination (Cheah, 2010).As such, a new system of assessment which is capable in determining the full potential of students and improving students' learning is greatly needed.This is why formative assessment is becoming more and more popular these days.In general, related to SBA, there are two main forms of assessment which are formative SBA and summative SBA.Formative SBA is an assessment to promote students' learning and it is school-based (Lembaga Peperiksaan Malaysia, 2011).It is conducted in line with the teaching and learning process using various methods of gathering information such as worksheet, observation, quiz, check list, assessment report, homework or test.By contrast, a summative SBA is an assessment which is also school-based providing a record of a student's overall achievement at the end of the month, semester or year using monthly or semestered testing (Harlen, 2004;Lembaga Peperiksaan Malaysia, 2011).
SBA, which is now being implemented in Malaysia has two main objectives, which are to gain an overall picture of an individual potentials and also to have a meaningful reporting on individual learning (Nor Hasnida, 2015).It includes both types of assessment; formative and summative (Lembaga Peperiksaan, 2010).Furthermore, SBA which focuses more on formative assessment rather than summative has been conducted in countries like Australia, New Zealand, Hong Kong, Finland, United Kingdom, USA, Canada, Africa, Sweden, Scandinavia and Singapore (Assessment Support Material, 2001).Australia has implemented SBA in the late 1960s (Mercurio, 2008) while Finland and Sweden had it in the early 1970s (Darling-Hammond and McCloskey, 2008).Malaysia has taken an astonishing decision when SBA has been formally implemented in all the government schools since 2011, with the Year one students becoming the first batch of students to undergo the SBA enactment.The need to have a valid and reliable measurement model to evaluate the implementation of the system is becoming increasingly important.Hence, the instruments used to assess the teachers' perception about a particular concept need to be evaluated first before administering.This is to make sure that the questionnaire used is valid and reliable or in other words, it is measuring what it is supposed to be measured and that the extent to which test scores are free of measurement error (Muijs, 2011).There is a valid and reliable instrument developed previously but it is on teachers' attitude towards SBA (Nor Hasnida et al., 2012).Validity and reliability of the questionnaire are the most important things to consider when dealing with measurement (Barroon and Abd Rahman, 2015).And, the relationship between the two is that, any test can be reliable without being valid but it cannot be valid if it is not reliable (Jackson, 2003).There are various types of reliability but in this study, three types of reliability are considered which are internal reliability, construct reliability (CR) and average variance extracted (AVE), whereas in validity aspect, there are convergent validity, construct validity and discriminant validity.Internal reliability is a concept referring to the degree to which all of the items are measuring the same underlying construct (Pallant, 2007) whereas construct reliability is a concept to assess the extent to which a measuring instrument accurately measures a theoretical construct that it was designed to measure (Jackson, 2003).Construct validity is the extent to which a set of items actually reflect the theoretical latent construct those items are designed to measure (Hair et al., 2006) whereas discriminant validity is a concept where individual measured items should represent only one latent construct.
When a questionnaire is valid and reliable, a researcher will have confidence in the results obtained using those questionnaires during data collection.Hence, the purpose of this study is to develop an instrument to evaluate teachers' perception on the factors concerning SBA implementation by exploring the different factor structures of the evaluation scale by using the second-order Confirmatory Factor Analysis.

Theoretical Underpinning of the Study
Following the CIPP Model developed by Daniel Stufflebeam, any system could be evaluated in four dimensions -context, input, process and product (Stufflebeam, 1971a).In evaluating a system, Stufflebeam believes that one can either look at one dimension at a time or at all the dimensions simultaneously (Stufflebeam and Shienkfield, 1985).Furthermore, each dimension in evaluation should contribute to a certain decision or conclusion as evaluation involves decision-making process (Stufflebeam, 1971a).Thus, context, input, process and product evaluation should serve planning, structuring, implementing and recycling decisions respectively.Or, in other words, context evaluation should be acquiring information like 'Were important needs addressed?',meanwhile input evaluation should be asking, 'Was the effort guided by a defensible plan?'.Furthermore, process evaluation should be acquiring, 'Was the service design executed competently?' and product evaluation should be providing us with the overall conclusion whether or not the effort thrive, ('Did the effort succeed?') (Stufflebeam, 2003).In this study, context evaluation includes two factors which identified as school type (urban and rural schools) and school category (primary and secondary schools).Input evaluation includes three 1st-order factors ('material and personal needs', 'personal qualifications' and 'physical infrastructure'), process evaluation includes ten 1st-order factors ('attitude', 'understanding', 'courses', 'in-house training', 'administration', 'challenges', 'moderation', 'monitoring', 'role of SBA' and 'importance of SBA') and product evaluation includes three 1st-order factors ('students' attitude, knowledge and motivation').
All the factors involved in this implementation of SBA are supported by learning theories such as behaviorism, Piaget learning theory, constructivism, multiple intelligences theory and brain research.In general, these learning theories agree to some extent that the implementation of SBA in classroom could improve students' learning.In other words, formative and summative assessments in SBA seem to fit all the above learning theories.There is a model that explains how the interaction between staff and students in a dynamic relationship produces an effective assessment practice.It is developed by Christ Rust and his colleagues from the Oxford Center for Staff and Learning Development (Rust et al., 2005) based on the social constructivist approach.The model shows that a constructive assessment process needs a well-defined explicit assessment criteria followed by an active engagement of the students and teachers with the criteria.
Active engagement with feedback is also important because feedback lies at the heart of assessment process.The assessment process also includes marking and moderation process by the teachers.The advantage of this model is due to its relevant application to both assessments i.e. formative and summative and there is a dynamic relationship between elements in assessment practices which represents the true picture of what really happens in the schools in the corporeal world.However, the problem with this model is that it is based solely on the social constructivist approach where the approach itself might have certain weaknesses.

Research Methods 2.
Survey questionnaire are distributed using postal mail and by-hand to the primary and secondary schools in ten major districts in Kelantan, a state in the north-east of Peninsular Malaysia.Teachers are selected as respondents because they are the most involved and the most concerned with the system compared to other parties.A total of 776 usable questionnaires were obtained for analysis.This sample size has met the suggested recommendation by Kline (2005) as he suggested that a sample size of more than 200 participants is enough to run SEM analysis.Similarly, 500 participants are regarded as a minimum sample size required for a study involving more than seven latent constructs with some constructs that have less than three items (Hair et al., 2010).The issues of uni-dimensionality, reliability and validity for all measurement models are determined.Uni-dimensionality is achieved when the factor loading of items for the respective latent construct is 0.5 or more (Zainuddin, 2012).Three types of reliability are considered, they are internal reliability, construct reliability (CR) and average variance extracted (AVE), whereas in validity aspect, there are also three categories of validity determined namely convergent validity, construct validity and discriminant validity are determined.The requirements are shown in Table 1.

Statistical Analysis 3.
In this study, AMOS version 18 and SPSS version 21 are used to facilitate the result analysis.AMOS software is used in assessing the relationship between latent and observed variables of a measurement model.The technique used is called a confirmatory factor analysis.It is a theory-driven technique which determines the goodness-of-fit between the model and the sample data (Byrne, 2010).This type of analysis is preferable when the researcher has had some knowledge about the latent structure.In this study, maximum likelihood estimation method is used in generating parameter estimates of the measurement models.This estimation method is more practical due to its ability to deal with complex models and also its robustness to non-normality data (Brown, 2006).There are a few fit indices used in this study to discern how well the specified model reproduces the covariance matrix among the indicator items (Hair et al., 2006).They are grouped under three main groups of measures; practical fit measures (chi-square statistics or X 2 /df), absolute fit indices (GFI, AGFI or RMSEA) and incremental fit indices (TLI or CFI).According to Hair et al. (2010), a study should report at least ISSN 2039-9340 (print) Mediterranean Journal of Social Sciences MCSER Publishing, Rome-Italy Vol 7 No 5 September 2016 117 three fit indices with at least one from each category.In addition, the accepted values listed in Table 2 have to be fulfilled if we were to gain a good or perfect fit model.

4.
Nearly two-third (74.7 percent) of the participants are females and one-third (24.6 percent) are males.The majority (93.6 percent) of them are Malays.Nearly half of them have had 10 to 20 years of teaching experience.Overall, most of them have experienced practising SBA in the range of 0 to 3 years.

Input Evaluation
The Input evaluation as a 2nd-order measurement model is proposed to measure personnel, resources and procedures in achieving SBA objectives (Stufflebeam, 1971a).Three factors are involved, known specifically as material and personal needs ('mat'), appropriateness of qualification ('appr') and suitability of physical infrastructure and ICT ('suit').These factors are measured by three items, two items and three items respectively as shown in Figure 1 (initial model).A total of eight items are used to measure input evaluation.The model yields a Chi-square (X 2 ) statistic of 157 with 756 on 17 degrees of freedom.The model was over-identified but with hierarchical model, the higher-order structure would be just-identified.To resolve just-identification issue, equality constraints are placed on particular parameters to yield a more accurate estimate.Reviewing the goodness of fit statistics, it shows that X 2 /df=9.280;GFI=0.952;AGFI=0.898;NFI=0.928;CFI=0.935;TLI=0.892 and RMSEA=0.103.This measurement model provides a poor fit and thus, modification such as deleting a construct or items is later conducted to gain a better fit.It follows with a determination of modification indices values to correlate the measurement error between items.According to Arbuckle and Wothke (1999), these have to be done by considering a theoretical or common sense to avoid producing an absurd parameter estimate.For final measurement model (Figure 1), four items are left to measure input evaluation.List of remaining items are as listed in Table 3.These remaining items (a17, a18, a19 and a20) have factor loadings ranging from 0.53 to 0.92 indicating the meaning of the factors that have been preserved.Reviewing the benefit of fit statistics, this final measurement model indicates a very good fit (as in Table 6).Finally, the issues of uni-dimensionality, validity and reliability have been addressed and are shown in Table 5.

Process Dimension
The process evaluation as a 2nd-order measurement model is proposed to measure the process implemented in achieving the objectives of the programme (Stufflebeam, 1971a).There are twelve major constructs proposed -belief, feeling, readiness, understanding, skills, in-house training, administration, moderation, monitoring, challenges, role and the importance of SBA with a total of fifty-two items.When this measurement model is run, the result shows that it does not fit the implementation process.Therefore, the principal component analysis (PCA) and confirmatory factor analysis (CFA) technique are conducted.The models have also been modified based on theory.Finally, four measurement models are produced -process1 with three 2 nd -order constructs which identified as attitude, understanding and courses (skills), process2 with two 2 nd -order constructs which were moderation and monitoring, process3 with two 2 nd -order constructs identified as role and importance (crucial) of SBA and last of all, is the 1 st -order construct which is challenges.
Model modification has been applied to get the most fitted models.The final measurement models for process dimensions are shown in Figure 2 and the issues of uni-dimensionality, validity and reliability are addressed in Table 5.

Product Dimension
The product evaluation as a 2nd-order measurement model is proposed to measure the program outcomes.Three factors are taken into consideration and they are students' attitude towards SBA ('att'), students' knowledge in SBA ('know') and students' motivation towards learning ('mot').These factors are measured by three items, two items and three items respectively.A total of eight items are used altogether.The model yields a Chi-square (X 2 ) statistic of 138 with 876 on 17 degrees of freedom.Reviewing the goodness of fit statistics, it shows that X 2 /df=8.169;GFI=0.960;AGFI=0.915;NFI=0.971;CFI=0.974;TLI=0.958 and RMSEA=0.096.This measurement model presents a poor fit, hence model modification such as deleting a construct or items was conducted to achieve a better fit.This follows with a determination of modification indices values to correlate the measurement error between items.As the consequence, there are five items left as shown in Table 4.For final product measurement model (Figure 3), five items remain to measure product evaluation.The remaining items (e32i, e32ii, e34i, e34ii and e34iii) have quite a high factor loading ranging from 0.84 to 0.95 indicating the meaning of the factors has been preserved.Reviewing the goodness of fit statistics, it shows that the measurement model indicates a very good fit (as shown in Table 6).Finally, the issues of uni-dimensionality, validity and reliability are addressed in Table 5.  121 standard errors are in good order; all standardized estimates are above the moderate strength and the multivariate kurtosis value has improved and has achieved the required level.All multivariate kurtosis values is less than 50.0 indicated a multivariate normality distribution of data set.However, there is a high correlation value between process3 and product (r=0.939)and also between process3 and process1 (r=0.923).It displays a multi-collinearity phenomenon, so process3 model has been deleted.

5.
Literatures have been reviewed to look for gaps in the existing SBA implementation research.Most evaluation processes look only at some dimensions which do not give a fully rounded indication of the effectiveness of the system implemented such as looking at teachers' attitude towards SBA (Majid, 2011), teachers' leadership (Boon and Shaharuddin, 2011), teachers' knowledge and best practises in SBA and few more.To date, studies that combine all the four dimensions of evaluation are non-existent.Therefore, in this study, the psychometric properties of an instrument is developed and measured.Selecting a validated instrument is easy but to get an instrument which suits the study objectives and the study context is quite difficult.In this case, it is to develop and validate an instrument to measure teachers' perception towards SBA implementation in the Malaysian context.Finding a validated instrument for this purpose is not easy as so many factors need to be considered in this context.Evidence has shown that the final model on the evaluation of SBA consisted of five factors (input, process1, process2, challenges and product evaluation).The model is hierarchical, so there are first and second-order factors involved.Input comprises of two first-order factors (personnel's qualification and physical infrastructure and ICT), process1 consists of three 1st-order factors (attitude, understanding and skills), process2 constitutes of two 1st-order factors (moderation and monitoring process), challenges consists of six strongly loading items and product is made up of two 1st-order factors (students' attitude and students' motivational towards learning).As all the values of fitness indices are the most well-fitting, all un-standardized estimates are statistically significant, all standard errors are in good order and all standard estimates are above the moderate strength, this result implies good reliability and validity of the instrument.Hence, the questionnaire is suitable to assess the perception of school teachers on the SBA system implementation in schools in the Malaysian context.

Conclusion 6.
The psychometric properties of a new extended SBA evaluation scale for assessing teachers' perception of the SBA system are presented.The instrument was developed after reviewing relevant literatures and consulting experts' in measurement and evaluation.The findings demonstrate that the instrument has adequate psychometric properties (valid and reliable) and is fit to be used for the main study as it was tested with quite a large sample size and has been analyzed using CFA.Furthermore, CFA procedures used in this study supported the conceptual framework set out earlier.Thus, it presents clearly the importance of evaluation of any system to follow all the dimensions outlined in the evaluation model by Stufflebeam.Hence, the findings of this present study have expanded the existing body of knowledge on the development of a measurement scale to evaluate the SBA system.
Nevertheless, this study has several limitations.First, the samples were taken only from teachers and not from other stakeholders, and therefore the development and validation of instruments might be limited.Furthermore, data comes only from the perceptions of the teachers without observing their real practices.Secondly, items included in the survey have been deleted during CFA procedure.Deletion of the items is needed to make sure that the models would be fit and yet considering the hypothesized models are acceptable but there might be other variables which are more influential than those we have chosen.In addition, there might also be other models which may fit the data that we have not tested.Finally, the sample of this study has only been collected at the government schools in one of the states in the north-east of Malaysia.Although the education system might not have much difference between each country, the cultural difference might limit the generalization of findings to other states.
The model reported here might be useful in the Asian context especially in those countries that are becoming more and more interested in making assessment system further aligned to classroom learning, providing effective feedback and validly describing students' learning.This current instrument could be such a great value for them.On the contrary, for those countries which are still examination-oriented in their assessment system, it is expected that there will be a higher disagreement and discrepancies among the teachers in accepting SBA as the teachers might see this new way assessment as relevant but not to the extent to improving students' attitude, knowledge and motivation towards learning.The results of this survey indicates that the knowledge of students of SBA are not consistent with the official Malaysian government policy concerning the objectives of the National Education Assessment System on improving students' learning.Certainly, based on this survey, we would expect practitioners or teachers to be exposed more vigorously to some form of professional development, so that they are equipped with enough skills especially on the use of feedback.If the students do not understand the function of feedback in improving their learning, it would be difficult to achieve the desired objectives in the SBA implementation.

Figure 1 :
Figure 1: Measurement model for input evaluation (initial model [left] and final model [right])Table 3. Input Evaluation Items and their descriptions Original Item Item Label Item Deleted Material and personal needs: It is easy to implement if teachers are supplied with a complete assessment document A teaching assistant is needed to help teachers in assessment Training of the personnel involved should be properly planned and implemented a14 a15 a16

Figure 2 :
Figure 2: Four measurement models for process evaluation (final model)

Figure 3 :
Figure 3: Measurement model for product evaluation (final model)

Table 1 .
Requirement for the reliability and validity of the measurement model

Table 4 .
Product Evaluation Items and their descriptions

Table 5 .
The CFA results reporting for the measurement model 2 nd -order factor 1 st -order factor Standardized factor loading Item Standardized factor loading

Table 6
shows the characteristics of the six final measurement models.In general, the fitness indices values are identified to be the most well-fitting; all un-standardized estimates are statistically significant given critical values more than1.96;all  ISSN 2039-9340 (print)

Table 6 .
Final characteristics of the measurement models