Using the Comprehensive Confirmatory Factor Analysis Method of Structural Equation Modeling in the Process of Graduates Employment

In the present world education is one of the most important basics of successful development of society. Incessant economic growth of any country and its competitiveness in the times of globalization of world economy are impossible without highly educated and highly trained laborers. When seeing the tasks facing higher education and statistics of education, the development of a practical basis of statistical analysis of higher education expert labor market and the employment of such graduates for jobs in their degree field seems potentially productive and extremely important. It is important for national and international studies. The knowledge of the United States of America is interesting from the point of view that if in the Russia the introduction of a two-stage system of higher education is still developing, in the United States of America the similar experience of training graduates is already obtained. To our mind, special attention should be paid to the criteria of education level which is used in the United States of America. The United States of America is a recognized leader among other countries with market economy, it is a country with a highly developed private sector. A considerable share of concentration of mediumsized and small business is in many respects caused by the mentality: in the United States of America the choice of specialty is influenced by possibilities of employment, but the prestige of a business also influences the education one’s gets.


Introduction 1.
In the modern world education is one of the most important prerequisites of successful development of society.Sustained economic growth of any country and its competitiveness in the times of globalization of world economy are impossible without highly educated and highly skilled laborers.
When considering the tasks facing higher education and statistics of education, the development of a methodological basis of statistical analysis of higher education expert labor market and the employment of such specialists for jobs in their degree field seems potentially productive and extremely important.It is important for national and international studies.
The experience of the USA is interesting from the point of view that if in the Russian Federation the introduction of a two-stage system of higher education is still developing, in the USA the similar experience of training specialists is already obtained.To our mind, special attention should be paid to the criteria of education quality which is used in the USA.The USA is a recognized leader among other countries with market economy, it is a country with a highly developed private sector.A considerable share of concentration of medium-sized and small business is in many respects caused by the mentality: in the USA the choice of speciality is influenced by possibilities of employment, but the prestige of a business also influences the education one's gets.
The organization of business by graduates from American higher education institutions is influenced by the whole set of factors.Taking into consideration everything said before, at the first stage of our research it is necessary to define factors influencing the basis of a new business organized by American university graduates.
The goal of this article is to define a set of factors which most strongly influence a probability of finding (opening) self-employment by graduates of high schools of the United States of America.Different methods for reducing the number of independent variables are reviewed and compared.First, it is the correlation analysis.This method was used for the assessment of these factors' influence on self-employment of graduates of high schools of the United States of America.Secondly, it is the classical factor analysis, and finally it is the Confirmatory Factor Analysis with application of Structural Equations.These methods were used to decrease the quantity of the variables influencing self-employment.Confirmatory Factor Analysis with application of Structural Equations was used for the confirmation of the received results.As the data base the information of the National Science Foundation of the United States of America was used.Moreover in this article the factors which have the most significant influence on self-employment of American graduates are described.

Research Methodology 2.
The methods used in the research to find out factors which most strongly influence a probability of finding (opening) selfemployment by graduates of high schools in the USA are the following: correlation analysis, Eigenvalue method, factor analysis.To prove the results the method of structural equations was used.The data base was the information of the National Science Foundation of the USA (http://www.nsf.org).

3.
Our goal is to define a set of factors which most strongly influence a probability of finding (opening) self-employment by graduates of high schools in the USA.We used the database on experts with higher education of the National Science Foundation of the USA.There are 447 parameters for each graduate.For the construction of a regression model we cannot use all of 447 parameters from the database, therefore, it is necessary to make a reduction of data which is carried out in several stages.
On the basis of expert estimation, the parameters which have no influence on the process of self-employment have been removed from the database.For example: "Taking courses during last (reference) week", "Living in the USA during last (reference) week" or "Reason for working less than 35 hours a week".At the given stage the number of parameters was reduced from 447 to 224.
Using the correlation analysis (pair correlation), from 224 parameters those which most strongly influence the probability of self-employment have been selected.As a result, 36 parameters have been selected with the module of pair correlation coefficients of 0,1 or more (Table 1).
The parameters with high autocorrelation have been removed from the selected 36 parameters: for example, parameters "Age Group [5 year intervals]" and "Year, date of birth [recoded for public use]" as they correlate with the "Age" parameter (the value of correlation is 0,99).Moreover, some variables can be grouped together: for example, "Employer size" or "Type of educational institution [employer]" (Table 1).Hence, such parameters cannot be used as independent variables and should be removed.After the removal of highly correlated parameters and parameters that can be grouped together only 18 parameters were left.
It is noteworthy that at first there were 100 000 cases in the database, but after the removal of cases with missing values, there were 569 cases left in the database.To check the received representative sampling we compared histograms of distribution of the variable "Age", both for the general set (Pic.1) and for sampling (Pic.2).
Picture 1.  So, the histogram of distribution of the variable "Age" for the general set does not considerably differ from the same histogram for sampling, it is possible to make a conclusion about the representative sampling.
On the basis of the remained 18 parameters the 3 and 2 factor models were received using factor analysis.The Kaiser criterion (Kaiser, 1960) based on Eigenvalues was used to find out the number of factors (Table 2) and the Cattell's scree test (Cattell, 1966).The scree plot of Eigenvalues is given in Pic. 3 Picture 3.
From the Table of Eigenvalues it is clear, that the greatest share of variance is being described by the first two factors, the same fact is confirmed by the plot, it has excesses on the second and third points (Pic.3).Hence, it is most logical to consider the 2 and 3 factor models.Of the factor analysis methods we selected the principal component analysis with various variants of rotation of axes for the 3 factor and 2 factor models.For rotation of axes the following methods were used: Unrotated, Varimax raw, Varimax normalized, Biquartimax raw, Biquartimax normalized, Quartimax raw, Quartimax normalized, Equamax raw and Equamax normalized.
As the criterion of of the rotation of axes method, the values of factor loadings were used.The method with the greatest values of factor loadings is considered the best.Factor loading is considered to be high if its value is 0,5 or more.The variable joined in the factor which has the greatest factor loading.Moreover, the best model should have a minimal correlation between factors.
After analyzing models with various variants of rotation of axes, we came to the conclusion, that the optimal method is Varimax normalized, both for the 3 factor and 2 factor models.Factor loadings for the given models are given in Table 2 and Table 3 (the most significant loadings are in bold type).
Relying on factor loadings, it is possible to assign variables to those factors.Let's have a closer look at the received models and describe them in detail.
The received 3 factor model describes dependence of probability of self-employment on the following 3 factors: "Experience", "The Attitude to education and science" and "Business characteristics".
The factor «Experience» describes work experience of a graduate and is linear approximation of the following characteristics of a high school graduate: "Age", "Year of highest degree", "Year of receiving a high school diploma", "Year of first bachelor's degree", "Year of most recent degree", "Year of second highest degree", "Year of third highest degree".It is noteworthy that factor loadings of variables for this factor are higher than 0,9, that shows good approximation of variables by the given factor.
The factor "The Attitude to education and science" shows the attitude to education and science, how much the activity of a graduate is connected with education and science, it is linear approximation of the following characteristics: "Activity, Research, Development, and Teaching", "Activity, Teaching", "Employer is an educational institution", "Work activities in the principal job: teaching".Factor loadings for this factor are not so unequivocal if compared with the previous ones, but all their values are high and not lower than 0,6.Finally, the third factor «Business characteristics» describes a new business where a graduate is employed, it is linearization of characteristics: "Full-time/part-time status including all jobs during the reference week", "Reason for working outside the highest degree field: family-related reasons", "Reason for working outside the highest degree field: a desired job is not available", "Work activities in the principal job: accounting, finance, contracts", "Work activities in the principal job: professional services", "Work activities in the principal job: sales, purchasing, marketing", "New business".
Factor loadings for the variables of this factor are not so significant, they all exceed two times the values of factor loadings for other factors and for some variables do not exceed 0,15.
The received 2 factor model describes the dependence of probability of self-employment on the following 2 factors: «Experience and environment conditions» and «Business characteristics».
The first factor «Experience and environment conditions» describes work experience of a graduate and work conditions.It is linear approximation of the following characteristics: "Age", "Year of highest degree", "Year of receiving a high school diploma", "Full-time/part-time status including all jobs during the reference week", "Reason for working outside the highest degree field: a desired job is not available", "Year of first bachelor's degree", "Year of most recent degree", "Year of second highest degree" and "Year of third highest degree".Practically all factor loadings of variables for the given factor have high value of more than 0,9, except for factor loadings for parameters "Full-time/part-time status including all jobs during the reference week" and "Reason for working outside the highest degree field: a desired job ia not available", the value of their factor loadings does not exceed 0,15.This fact shows the low influence of these parameters on this factor.
The second factor "Business characteristics", as well as the third factor in the 3 factor models characterizes the business in which a graduate is employed.The given factor is linearization of the following parameters: "Activity, Research, Development, and Teaching", "Activity, Teaching", "Employer is an educational institution", "Reason for working outside the highest degree field: family-related reasons", "Work activities in the principal job: accounting, finance, contracts", "Work activities in the principal job: professional services", "Work activities in the principal job: sales, purchasing, marketing", "Work activities in the principal job: teaching", "New business".The situation with factor loadings for this factor is similar to the first factor of the given model.Factor loadings change within 0,8 -0,04, and the parameter "Reason for working outside the highest degree field: family-related reasons" has the least influence on the given factor.
Below there are given Tables of correlations between factors (Table 6 and Table 7), both for the 3 factor and the 2 factor models.
If to consider the correlation between probability of self-employment and the received factors (Table 7 and Table  8), we can come to the following conclusions.First , taking into consideration the 3 factor model it is obvious that the third factor has the greatest influence on probability of self-employment, it is followed by the first factor and then the second one.Secondly, with the growth of the first or third factors, the value of probability decreases, and with the growth of the second factor, the probability increases.Thirdly, in the 2 factor model the both factors have equally negative influence on probability of self-employment.If to compare the 3 factor and the 2 factor models, it is possible to draw the following conclusion: though the correlation between factors in the 2 factor model is less than in the 3 factor model, from the point of view of the explanation of factor value, the 3 factor model is better.We will verify the results of the factor analysis using structural equations.Reliability of the received factors was verified by the confirming factor analysis using structural equations.Using the structural equations for both the 3 factor and the 2 factor models, we employed the method of maximum likelihood estimation together with the method of least squares, with absence of correlation between factors and the residuals since low correlation between factors was given above (Table 5 and Table 6).The data for the analysis was the matrix of correlations between parameters. the results of the analysis of the constructed structural equations were the following.
For the 3 factor model the results are described in Table 9 (the significant facts are in bold type), and the normal probability plot of residuals is given in Pic.4.From Table 9 it is obvious that the ways for the factors "Experience" and "The Attitude to education and science" are significant as they have high value of T-statistics and low probability.Hence, it is possible to conclude that the considered model precisely describes the set forth above factors.However, we can observe that the factor "Business characteristics" is not so well described by the given model, not all ways for this factor have high T-statistics and low probability.For additional verification of the importance of the model the normal probability plot of residuals (Pic.4) has been constructed.From the plot it is obvious that the residuals of model are situated close to the straight line of normal distribution that confirms the importance of the constructed model.On the whole we can conclude that the results received, using the diagram of ways for the 3 factor model, coincide with the results of the factor analysis for this model which proves their correctness.Let's now consider the 2 factor model.The parameters of the 2 factor model are given in Table 9 and the normal probability plot of residuals is given in Pic.4.As well as in the factor analysis, the factor "Experience and environment conditions" is well enough described by the diagram of ways, practically all the ways have high T-statistics and low probability.The exceptions are only the ways for the parameters "Full-time/part-time status including all jobs during the reference week" and "Reason for working outside the highest degree field: a desired job is not available".The second factor is not so well described by the given model.The ways for the parameters "Reason for working outside the highest degree field: family-related reasons", "Work activities in the principal job: professional services" and "New business" have low T-statistics and high probability.The normal probability plot of residuals for this model is practically similar to the normal probability plot of residuals for the 3 factor model -this proves that the model conforms to real data.On the whole, the model confirms the results of the factor analysis.

Picture 4
Picture 5 Considering all the results described above we can draw the conclusion that the received structure of correlation between the parameters and the factors which include them coincides with the structure of correlation of the factor analysis for the 3 factor and the 2 factor models.Hence, the results of the confirming factor analysis using structural equations, on the whole, confirm the results of the factor analysis.
Full-time/part-time status including all jobs during the reference week X -0,13 NEDTP Type of a non-educational institution [employer] X -0,57 NRFAM Reason for working outside the highest degree field: family-related reasons X 0,10 NROCNA Reason for working outside the highest degree field: a desired job is not available X

Table 4
Factor Loadings (Varimax normalized) Clusters of loadings are marked; they determine the oblique factors for hierarchical analysis