EducationThe science

Logistic regression: model and methods

и дискриминантного анализа используются тогда, когда необходимо четко дифференцировать респондентов по целевым категориям. Methods of logistic regression and discriminant analysis are used when it is necessary to clearly differentiate respondents by target categories. In this case, the groups themselves are represented by the levels of a single univariant parameter. а также выясним, для чего она нужна. Let's consider in detail the logistic regression model, and also find out why it is needed.

General information

, может выступать классификация респондентов по группам покупающих и не покупающих горчицу. An example of a problem in the solution of which logistic regression is used, can be the classification of respondents by groups of buyers who do not buy mustard. Differentiation is carried out in accordance with socio-demographic characteristics. Among them, in particular, include age, gender, number of relatives, income, etc. In operations there are differentiation criteria and a variable. The latter encodes the target categories, which, in fact, need to separate the respondents.

Nuances

, значительно уже, чем для дискриминантного анализа. It should be noted that the range of cases in which logistic regression is applied is significantly narrower than for discriminant analysis. In this regard, the use of the latter as a universal method of differentiation is considered preferable. Moreover, experts recommend starting classification studies with discriminant analysis. And only in case of uncertainty for the results you can use logistic regression. This need is due to some factors. используется при наличии четкого представления о типе независимых и зависимых переменных. Logistic regression is used when there is a clear understanding of the type of independent and dependent variables. In accordance with this, one of the 3 possible procedures is selected. With discriminant analysis, the researcher always deals with one static operation. It involves one dependent and several independent categorical variables with a scale of any type.

Kinds

, состоит в определении вероятности того, что определенный респондент будет отнесен к той или иной группе. The task of statistical research in which logistic regression is used is to determine the probability that a particular respondent will be assigned to a particular group. Differentiation is carried out according to certain parameters. In practice, according to the values of one or several independent factors, it is possible to classify respondents into two groups. . In this case, binary logistic regression takes place. Also, the specified parameters can be used for allocation to groups that are more than two. In this situation, there is a multinomial logistic regression. The resulting groups are expressed by the levels of a single variable.

Example

Let's say there are respondents' answers to the question of whether they are interested in the offer to acquire a land plot in a suburb of Moscow. The options are "no" and "yes". It is necessary to find out which factors exert a primary influence on the decision of potential buyers. To do this, the interviewees are asked questions about the infrastructure of the territory, the distance to the capital, the area of the plot, the presence / absence of a residential structure, etc. Using binary regression, respondents can be divided into two groups. The first will include those who are interested in acquiring - potential buyers, and the second, respectively, those who are not interested in such a proposal. For each respondent, in addition, the probability of referring to a category will be calculated.

Comparative characteristics

The difference from the two variants mentioned above consists in a different number of groups and the type of dependent and independent variables. In a binary regression, for example, the dependence of the dichotomous factor on one or more independent conditions is studied. The latter can have any type of scale. Multinomial regression is considered a variation of this classification option. In it to the dependent variable there are more than 2 groups. Independent factors should have either an ordinal or a nominal scale.

Logistic regression in spss

In the statistical package 11-12, a new version of the analysis was introduced: sequential. This method is used in the case when the dependent factor refers to the same (ordinal) scale. In this case, independent variables are selected of one specific type. They must be either ordinal or nominal. Classification by several categories is considered the most universal. This method can be used in all studies in which logistic regression is applied . , однако, можно только с помощью всех трех приемов. To improve the quality of the model , however, it is possible only with the help of all three techniques.

Ordinal classification

It is worth mentioning that earlier in the statistical package there was not provided a typical possibility of performing specialized analysis for dependent factors with ordinal scale. For all variables with the number of groups greater than 2, a multinomial variant was used. A relatively recent ordinal analysis has a number of features. They take into account the specifics of the scale. часто не рассматривается как отдельный прием. Meanwhile, in procedural manuals, ordinal logistic regression is often not considered a separate technique. This is due to the following: ordinal analysis does not have any significant advantages over multinomial. The researcher may well use the latter if there is both an ordinal and a nominal dependent variable. At the same time, classification processes themselves do not differ much from each other. This means that carrying out ordinal analysis will not cause any difficulties.

Analysis option

Consider a simple case - binary regression. Suppose, in the process of marketing research, the relevance of graduates of a certain high school in the capital is estimated. In the questionnaire, the respondents were asked questions, including:

  1. Are you working? (Ql).
  2. Indicate the year of graduation (q 21).
  3. What is the average graduation point (aver).
  4. Sex (q22).

позволит оценить воздействие независимых факторов aver, q 21 и q 22 на переменную ql. Logistic regression will allow to evaluate the impact of independent factors aver, q 21 and q 22 on the variable ql. Simply put, the purpose of the analysis will be to determine the probable employment of graduates on the basis of information about the field, the year of graduation and the average score.

Logistic Regression

To set parameters using binary regression, use the Analyze►Regression►Binary Logistic menu. In the Logistic Regression window, you need to select the dependent factor in the left list of available variables. He is ql. This variable must be placed in the Dependent field. After that, independent factors should be introduced into the Covariates site - q 21, q 22, aver. Then you need to choose a way to include them in the analysis. If the number of independent factors is greater than 2, then the method of simultaneous introduction of all variables, which is installed by default, is not a step-by-step method. The most popular way is considered to be Backward: LR. Using the Select button, you can not include all respondents in the study, but only a specific target category.

Define Categorical Variables

The Categorical button should be used when one of the independent variables is nominal and the number of categories is greater than 2. In this situation, in the Define Categorical Variables window, this parameter is placed on the Categorical Covariates section. In this example, there is no such variable. After that, in the drop-down list of Contrast, select Deviation and press the Change button. As a result, several dependent variables will be formed from each nominal factor. Their number corresponds to the number of categories of the original condition.

Save New Variables

Using the Save button, you create new parameters in the main research dialog box. They will contain indicators calculated in the regression process. In particular, you can create variables that define:

  1. Belonging to a specific category of classification (Groupmembership).
  2. The probability of assigning a respondent to each study group (Probabilities).

When using the Options button, the researcher does not receive any significant features. Accordingly, it can be ignored. After clicking the "OK" button, the analysis results will be displayed in the main window.

Quality control of adequacy and logistic regression

Consider the Omnibus Testsof Model Coefficients table. It displays the results of the analysis of the approximation quality of the model. In connection with the fact that a step-by-step version was specified, it is necessary to look at the results of the last stage (Step2). A positive result will be considered that results in an increase in the Chi-square indicator when going to the next stage with a high degree of significance (Sig. <0.05). The quality of the model is evaluated in the Model line. If a negative value is obtained, but it is not considered to be significant at the general high importance of the model, the latter can be considered practically useful.

Tables

The Model Summary makes it possible to estimate the index of the aggregate variance that the constructed model describes (the R Square indicator). It is recommended to apply the value of Nagelker. A positive indicator is the parameter Nagelkerke R Square, if it is above 0.50. After that, the results of the classification are evaluated, in which the actual indicators of belonging to one or other of the studied categories are compared with those predicted on the basis of the regression model. To do this, use the Classification Table. It also allows us to draw conclusions about the correctness of differentiation for each group under consideration. . The following table provides an opportunity to clarify the statistical significance of the independent factors introduced in the analysis, as well as each non-standardized coefficient of logistic regression . Based on these indicators, it is possible to predict the belonging of each respondent in the sample to a particular group. Using the Save button, you can enter new variables. They will contain information about belonging to a specific classification category (Predictedcategory) and the probability of inclusion in these groups (Predicted probabilities membership). After clicking "OK", the results of calculations appear in the main window of the Multinomial Logistic Regression.

The first table, in which there are important indicators for the researcher, is Model Fitting Information. A high level of statistical significance will indicate the high quality and suitability of using the model in solving practical problems. Another important table is Pseudo R-Square. It allows us to estimate the share of the total variance in the dependent factor, which is determined by the independent variables chosen for the analysis. According to the table Likelihood Ratio Tests you can draw conclusions about the statistical significance of the latter. In the Parameter Estimates, non-standardized coefficients are reflected. They are used in the construction of the equation. In addition, for each combination of variables, the statistical significance of their effect on the dependent factor has been determined. Meanwhile, in marketing research, there is often a need to differentiate by category of respondents, not individually, but as part of the target group. To do this, use the Observedand Predicted Frequencies table.

Practical use

This method of analysis is widely used in the work of traders. In 1991, an indicator of logistic sigmoid regression was developed. It is an easy-to-use and efficient tool with which you can predict the likely prices before they "overheat." The indicator is represented on the graph in the form of a channel formed by two lines running in parallel. They are removed at an equal distance from the trend. The width of the corridor will depend solely on the timeframe. The indicator is used when working with almost all assets - from currency pairs to precious metals.

In practice, two key strategies for the application of the tool have been developed: to breakdown and reverse. In the latter case, the trader will be guided by the dynamics of the price change within the channel. As the cost approaches the support line or resistance, the bet is made on the probability that the movement will start in the opposite direction. If the price closely approaches the upper border, then the asset can be disposed of. If it is at the lower limit, then it is worth thinking about the acquisition. The strategy for breakdown involves the use of orders. They are set outside the limits of a relatively small distance. Taking into account that the price in a number of cases violates them for a short time, you should be safe and install stop-loss. At the same time, of course, regardless of the chosen strategy, the trader needs to be as cool as possible to perceive and evaluate the situation that has arisen in the market.

Conclusion

Thus, the application of logistic regression allows you to quickly and simply classify respondents into categories in accordance with specified parameters. In the analysis, you can use any particular method. In particular, multinomial regression is versatile. However, experts recommend using all the above methods in the complex. This is due to the fact that in this case the quality of the model will be much higher. This, in turn, will expand the range of its application.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.birmiss.com. Theme powered by WordPress.