*Department of Statistics, College of administration and Economic, University of Sulaimani, Sulaymaniyah, Iraq*

Received: 16-05-2020 Accepted: 26-07-2020 Published: 01-07-2020

**ABSTRACT**

In this study, the Tobit Model as a statistical regression model was used to study factors affecting blood pressure (BP) in patients with renal failure. The data have been collected from (300) patients in Shar Hospital in Sulaimani city. Those records contain BP rates per person in patients with renal failure as a response variable (Y) which is measured in units of millimeters of mercury (mmHg), and explanatory variables (Age [year], blood urea measured in milligram per deciliter [mg/dl], body mass index [BMI] expressed in units of kg/m^{2} [kilogram meter square], and Waist circumference measured by the Centimeter [cm]). The two levels of BP; high and low were taken from the patients. The mean arterial pressure (MAP) was used to find the average of both levels (high and low BP). The average BP rate of those patients equal to or >93.33 mmHg only remained in the dataset. The 93.33 mmHg is a normal range of MAP equal to 12/8 mmHg normal range of BP. The others have been censored as zero value, i.e., left censored. Furthermore, the same data were truncated from below. Then, in the truncated samples, only those cases under risk of BP (greater than or equal to BP 93.33mmHg) are recorded. The others were omitted from the dataset. Then, the Tobit Model applied on censored and truncated data using a statistical program (R program) version 3.6.1. The data censored and truncated from the left side at a point equal to zero. The result shows that factors age and blood urea have significant effects on BP, while BMI and Waist circumference factors have not to affect the dependent variable(y). Furthermore, a multiple regression model was found through ordinary least Square (OLS) analysis from the same data using the Stratigraphy program version 11. The result of (OLS) shows that multiple regression analysis is not a suitable model when we have censored and truncated data, whereas the Tobit model is a proficient technique to indicate the relationship between an explanatory variable, and truncated, or censored dependent variable.

**Index Terms:** Tobit Model, Censored Regression, Truncated Regression, Renal Failure, Blood Pressure

In economic and social research, many types of regression models applied. Their use is dependent on the nature of the data. The Tobit model is regarded as the most appropriate statistical model for solving those cases that the dependent variable is censored or truncated [1]. Tobit regression has been the subject of great theoretical interests in numerous practical applications. It has been developed and used in many fields, such as econometrics, finance, and medicine [1], [2]. Furthermore, it is regarded as a linear regression model where only data on the response variable incompletely observed; the response variable is censored at zero. Kidney diseases are common diseases worldwide; it is a global public health problem affecting 750 million persons globally [3]. It plays an important role in preserving normal body functions. Most people are not aware of their impaired kidney functions. In fact, kidney failure is a “silent illness” that sometimes has no obvious early symptoms. Many people with kidney diseases are not conscious that they are at high risk of kidney failure, which could require dialysis or transplantation. Often the disease such as diabetes with high blood pressure (BP) may cause kidney damage. Hypertension (high BP) is both a cause and a consequence of renal diseases, which are difficult to distinguish its types clinically [4]. Hence, the importance of this research comes as studies the factors that affect BP in patients with renal disease and knowing the real causes of it. This is crucial for medical staff and specialists (doctors) to eliminate problems and limit the spread of kidney diseases because high BP is both a cause and a consequence of kidney diseases. In this study, we find an influence on each independent variable of the dependent variable (BP). It is known that the normal BP range is 12/8 mmHg [5]. This value change due to many factors, and any change in this range make many health problems. Therefore, controlling BP and finding factor, everybody should take care of it. In this study, the data collected from patients in a dialysis center at Shar hospital in Sulaimani city. The two levels of BP; high and low BP from the patients (as dependent variables) and some independent variables (Age, blood, urea, body mass index [BMI], and Waist circumference) were taken. Each patient has their own specific BP (high, low), then we could not take high and low BP separately for our study. That is why the mean arterial pressure (MAP) was performed. It is an average arterial pressure contains high and low BP [6]. A threshold point equal to 93.33 is determined and found by MAP equation [7], equal to 12/8 mmHg, which is a normal range of BP. We assumed any value lower that range is equal to zero. Therefore, the Tobit regression model is used because some variables are equal to zero for a number of observations. This is a phenomenon that can generally be termed censored or truncated data. After that multiple regression model performed for the same data based on ordinary least square (OLS) analysis, it is found that a multiple regression model is not suitable for analysis because there are a number of observations in the dependent variable equal to zero. The use of OLS models in the case of censored sample datasets and depending on the number of zeros makes OLS estimated bias [8].

The aim of this study is to detect the impact of the independent variables (Age, blood, urea, BMI, and waist circumference) on dependent variables (BP) in patients with renal failure putting these results in front of specialists to eliminate a problem using a statistical model (Tobit model). Knowing which factor in the independent variable more effect on the dependent variable also comparison between (OLS) and Tobit model estimation to knowing which of them are suitable models for estimation.

Odah *et al*. [9] displayed the most significant factors affecting loans provided by Iraqi banks and the best methods to estimate the data using a Tobit regression model and OLS method. Liquidity and loan repayment were found to affect loans from the Iraqi Banks, while the effects of interest rate and borrowers were not statistically significant. The outcome of Tobit and OLS estimations indicate that bias will result when estimating Iraqi bank loans using OLS if bank loans are limited.

Prahutama *et al*. [10] used a Tobit regression model to study factors that affect household expenditure on education in Semarang city. The dependent variable used in this study is household expenditure for education. The independent variables used include the Education of the Head of the Household, Occupation of the Head of the household, number of household members, Number of Working Household Members, the proportion of household members who attend school in Junior High School, Senior High School and College, and food expenditure in households and regions. Based on the Tobit regression analysis proportion of household members who are taking education in college is the most significant contribution to the high cost of household expenditure.

Ahmed [11] applied a Tobit (Truncated), (censored) data regression models and multiple regression with the least square method for persons whose levels exceed 120 g/dl under the risk of diabetes injure, in the sample data (*n* = 500) on the assumption that blood sugar (y), depends on the explanatory (Age: X1, Cholesterol: X2 gram/deciliter, and Triglycerides: X3 gram/deciliter). The results revealed that the censored regression model was more applicable than the other regression models (truncated, and multiple regression), the two factors (Age and triglycerides have highly significant effects on the blood sugar.

Ahmad *et al*. [12] used Tobit regression analysis and data envelopment analysis (DEA) to address some of the important working capital management policies and efficiency regarding the manufacturing sector of Pakistan. To achieve that data from 37 firms have been taken for the periods 2009–2014. Tobit regression analysis concludes that the average period has significant negative impacts on efficiency and current ratio, gross working capital turnover ratio, and financial leverage ratio that have a positive significant impact on efficiency.

Samsudin *et al*. [13] applied the Tobit model and DEA to examine the efficiency of public hospitals in Malaysia and identify the factors affecting their performance. The study analysis was based on 25 public hospitals in the northern region of Malaysia. According to the result of this study found that the daily average number of admission, the number of outpatient per doctor, and hospital classification have significant influences on hospital inefficiencies.

Odah *et al*. [14] investigated the factors affect divorce decision, and determine the most important factors causing divorce in Iraq through using the Tobit regression model and probit regression model. The data were collected through the application of the questionnaire. According to Tobit regression analysis results, marital infidelity is the main reason for the increase in divorce cases, as well as the preoccupation of the couple with social networking sites. After using the probit model, it found that age, social media sites, and income have a significant impact on the decision to divorce.

Zorlutuna *et al*. [15] applied Tobit regression analysis for the measurement of lung cancer patients. Data taken from Sivas Cumhuriyet University Faculty of Medicine Research and Application Hospital Oncology Center consists of 535 patients who have lung cancer. Tobit regression results show that when the dependent variable phase of the patient’s disease, the patient’s gender, patient’s condition, and the pathological consequences of the disease were found to be statistically significant variables. The sex of patient has positive effect on the stage of the disease, while pathological condition has negative influences.

Anastasopoulos *et al*. [16] provided a demonstration of Tobit regression as a methodological approach to gain new insights into the factors that significantly influence accident rates. Using 5 years of vehicle accident data from Indiana, the estimation results show that many factors relating to pavement condition, roadway geometrics, and traffic characteristics significantly affect vehicle accident rates.

The regression analysis is one of the statistical methods used to explain the relationship between explanatory variables and the dependent variable. Therefore, choosing an appropriate model for the available data is a necessity of this analysis. In many statistical analyses of individual data, the dependent variable is censored. If the dependent variable is censored, the use of a conventional regression model with this type of data will lead to a bias in the estimation of the parameters there for the best model for this type of data is the Tobit model [17]. The Tobit model family of statistical regression models defines the relationship between censored or truncated continuous dependent variables and some independent variables [18]. It has been used in many areas of applications, including dental health, medical research, and economics [2]. The Tobit model refers to a regression model where the range of dependent variables is limited in some ways [16]. A model invents by Tobin in which it is supposed that the dependent variable has a number of its values clustered at limited value, usually zero [19]. This model was first introduced statistical literature in the 1950s and was called “censored normal regression model.” It has been used for health studies since the 1980s. The Tobit model is an efficient method for estimating the relationship from Probit between an explanatory variable and truncated or censored dependent variable. The origin of the Tobit model is from Probit analysis and multiple regressions. The benefit of this model, using all the information that either Probit models (or logit) or OLS, would allow separately [20].

The structural equation in the Tobit model is

Where e_{i~} N(0,σ^{2}). y^{*} is a latent variable that is observed for values greater than t and censored otherwise. The observed y is defined by the following measurement equation

In the typical Tobit model, we assume that τ=0, i.e., the data are censored at 0. T use we have

As we have seen from earlier, the likelihood function for the censored normal distribution is

Where τ is the censoring point. In the traditional Tobit model, we set t=0 and parameterize μ as X_{i} β. This gives us the likelihood function for the Tobit model:

The log-likelihood function for the Tobit model is

The overall log-likelihood is made up of two parts. The first part corresponds to the classical regression for the uncensored observations, while the second part corresponds to the relevant probabilities that observation is censored.

The leading causes of incompletely observed data are truncation and censoring.

The effect of truncation occurs when the observed data in the sample only drawn from a subset of a larger population [23]. On the other hand, a dependent variable in a model is truncated, if observations cannot be seen when taking value with a certain range. This means, both the independent and the dependent variables are not observed when the dependent variable is in that range [24]. There are two types of Truncation: from below and from above (Truncation from left and Truncation from right). Figs. 1 and 2 explain the probability distribution of Truncated from below [11].

**Fig. 1.** Truncated from below with the probability distribution explaining (threshold = 3) [11].

**Fig. 2.** Truncated normal distribution [11].

The idea of “censoring” is that some data above or below the threshold is misreported at the threshold. Hence, the observed data are generated by a mixed distribution with both a continuous and a discrete component. The censoring process may be explicit in the data collection process, or it may be a by-product of economic constraints involved in constructing the data set [24]. When the dependent variable is censored, values in a certain range are all transformed to (or reported as) a single value [25]. Fig. 3 Explain the probability distributions of Censored from below [11].

**Fig. 3.** Censored from below with the probability distributions explaining (threshold = 5) [11].

After formally considering the Tobit model, we need some results about truncated and censored normal distribution. These distributions are at the foundation of most models for truncation and censoring. The results are given for censoring and truncation on the left, which translate into censoring from below in the Tobit model. Corresponding formulas are given for censoring and truncation on the right, and both on the left and on the right.

Let *y* denote the observed value of the dependent variable. Unlike the normal regression, *y* is the incompletely observed value of a latent depended variable *y**. Recall that with truncation, our sample data are drawn from a subset of a large population. In effects with truncation from below, we only observe *y*=*y** if *y** is larger than truncation point t. In effect, we lose the observation on *y** that are smaller or equal to t when this is the case, we typically assume that the variable y/y >t follows a truncated normal distribution. Thus, if a continuous random variable y has pdf f(*y*) and t is constant. Then we have:

We know that

Where and Φ(.) is the standard normal cdf. The density of the truncated normal distribution is

Where Φ(.) is the standard normal pdf.

The likelihood function for the truncated normal distribution is

Or

When a distribution is censored on the left, observations with values at or below t are set to t_{y}

The use of τ and τ_{y} is just a generalization of having τ and τ_{y} set as 0. If a continues variable y has a pdf f(y) and τ is constant, then we have

In other words, the density of y is the same as that for y^{*} for y >τ and is equal to the probability of observation of y^{*} < τ if y=τ. d is an indicator variable that equals 1 if y >τ. The observation is uncensored and is equal to 0 if y = τ the observation is censored.

And

Thus, the likelihood function can be written as

The estimated (β* _{k}*) vector shows the effect of (

1. Marginal effect on the latent dependent variable, y^{*}:

Thus, the reported Tobit coefficients indicate how a one-unit change in an independent variable alters the latent dependent variable.

2. Marginal effect on the expected value for y for uncensored observations:

3. Marginal effect on the expected value for y (censored and uncensored):

In this part, results will be presented to the applied side of the study using statically package (R program) version 3.6.1 and Stratigraphy program version 11.

Table 1 shows that a sample is taken from (300) patients with kidney diseases in dialyzes center in Shar Hospitals. The two levels of BP; high and low BP from the patients (as dependent variables) and some independent variables (age, blood, urea, BMI, and Waist circumference) were taken. We found the average of BP by MAP equation that is contain each (high and low) BP, we could not take high and low BP separately because we determined threshold point equal to 93.33 founded by MAP equation, equal to 12/8 mmHg which is a normal range of BP.

**TABLE 1** Samples are taken from (300) patients

Table 2 shows all measures of descriptive statistics. The descriptive statistics give an overview of working with the minimum, maximum, mean, and median of (Age, blood urea, BMI, Waist Circumference), and the results are 18, 17.40, 13.84, and 30 respectively. The max numbers of those variables are 87, 404, 42.97, and 150 respectively. The mean and median of all independent variables are 51.39, 118.86, 25.46, 68.4, 49.00, 117.00, 23.44, and 60.0 respectively.

**TABLE 2** Descriptive statistics of dependent and independent variables in the study

Table 3 shows that *P*-value: 2.22e-16 and Log-likelihood: -1278.455on 6 Df, wald- statistics 173.8 on 4 Df, Akaike information criterion (AIC)=2566.91, AIC={-2(log-likelihood)+2K}, where K is the number of model parameter plus the intercept. Log-likelihood is a measure of model fit the higher the number the better the fit, and the minimum AIC is the score for the best model. Mean square error (MSE)=0.9305

**TABLE 3** Results of censored regression model: censored (formula=Y~X, left=0, right=Infinity, data=my data)

Table 4 shows that Log-likelihood= -1476.9 on 6 Df, and the AIC=2963.8. MSE=0.993.

**TABLE 4** Results of the truncated regression model

From the output of Tables 5-7 shows that the results of fitting a multiple linear regression model to describe the relationship between BP and 4 independent variables. Since the *P*-value in the ANOVA table is <0.05, there is a statistically significant relationship between the variables at the 95.0% confidence level. Table 6 represent the R-Squared statistic indicates that the model as fitted explains 39.9258% of the variability in BP. The adjusted R-squared statistic, which is more suitable for comparing models with different numbers of independent variables, is 39.1112%. The standard error of the estimate shows the standard deviation of the residuals to be 34.8559. Table 7 shows the analysis Variance of dependent and independent variables.

**TABLE 5** Fitting multiple regression model (OLS) using Stratigraphy program

**TABLE 6** Model summery

**TABLE 7** Analysis of variance

Fig. 4 is a standardized residual for multiple regression models using (OLS). It is clear that the (OLS) method not a suitable method when data censored.

**Fig. 4.** Standardized residual for multiple regression model using (ordinary least square).

Analyzing medical data with a Tobit model when it has a threshold point; it help experts (doctors and medical staffs) to identify factors affecting blood pressure in patients with kidney failure. In this study, the Tobit model (censored and truncated) regression model, and a multiple regression model with the least square method (OLS) applied to the data size (n=300) for the cases their rates are greater than or equal to (93.33). By taking the hypothesis that the (BP y) depends on the expletory variables (age, blood urea, BMI, and Waist circumference) and comparing their results, the following important points are concluded below.

The result in Table 2 shows all measures of descriptive statistics. The descriptive statistics give an overview of working with the minimum, maximum, mean, and median of (age, blood urea, BMI, and Waist Circumference), and the results are 18, 17.40, 13.84, and 30, respectively. The maximum numbers of those variables are 87, 404, 42.97, and 150, respectively. The mean and median of all independent variables are 51.39, 118.86, 25.46, 68.4, 49.00, 117.00, 23.44, and 60.0, respectively.

The results of analysis censored regression model in Table 3 show the final result with all significant variables for the phenomenon study, the results of parameter estimation and t value analysis, the significant factors affecting BP. *P* = 2.22e-16 and Log-likelihood= -1278.455 on 6 Df, Wald statistic= 173.8 on 4 Df, AIC=2566.91. The Log-likelihood is a measure of the model fit the higher number of it is a better fit. The minimum AIC is the score for the best model. The MSE is equal to 0.9305. We know that the (b) is the relationship between the response variable and covariates, if (+b) it means a positive relationship and if (-b) means a negative relationship. Through the result in Table 3 appear the relationship between the variables (age and blood urea) is positive because the variables have a positive relationship with the dependent variables (BP) and those variables (Age and blood urea) have highly significant effects on BP. Furthermore, the relationship between (BMI) and BP is negative. If there is an increase in (BMI) by one unit the (BP) decreases by (-0.44822). The factors (BMI and Waist circumference) appeared to have no significant effects on BP.

From Tables 3-8 show that the censored regression model for the samples is a more suitable model than other regression models (Truncated, Marginal, and Multiple). This result found by comparing their AIC, log-like values, and MSE.

The censored with the marginal effects from Table 8 shows that the two variables (age and blood urea) have highly significant effects. The changes in years make BP significantly increasing by 0.77%. This means that the effect of age for any case in the sample with Std. error is by 0.16%. Furthermore, one unit of blood urea for each point increases by 0.46% with stander error (0.04).

**TABLE 8** Results of marginal effects

In the result of multiple regression models, using (OLS) method, we detected that since the p-value in the ANOVA table is <0.05, there is a statistically significant relationship between the variables at the %95 confidence interval the R-square statistic indicates that the model as fitted explains 0.39 of the variability BP. And theoretically, as defined, the OLS (unconditional estimates) are bias.

In this study, both Tobit regression analysis and OLS analysis were used for studying factors affecting the BP. In this work, the data collected from 300 patients in a dialysis center at Shar hospital in Sulaimani city. The two levels of BP; high and low from the patients (as dependent variables) and some independent variables (age, blood, urea, BMI, and Waist circumference) were taken. Each patient has own specific BP (high and low). Then, we could not take high and low BP separately for our study. That is why the MAP was performed. It is an average arterial pressure contains high and low BP. When studying BP as a dependent variable, we find that variable data are censored at zero. In this case, the Tobit model is most suitable model to use. It was found that the two factors (age and blood urea) have highly significant effects on BP. However, the two variables (BMI and Waist circumference) appeared to have no effects on the dependent variable. The comparison of the result from Tobit and OLS estimations shows that biased can result when estimation BP using OLS if BP restricted at the threshold point

[1]. T. Amemiya. “Tobit models:A survey“.

[2]. W. Wang and M. E. Griswold. “Natural interpretations in Tobit regression models using marginal estimation methods“.

[3]. D. C. Crews, A. K. Bello and G. Saadi. “2019 World kidney day editorial-burden, access, and disparities in kidney disease“.

[4]. R. A. Preston, I. Singer, and M. Epstein. “Renal parenchymal hypertension:Current concepts of pathogenesis and management“.

[5]. J. A. Staessen, Y. Li, A. Hara, K. Asayama, E. Dolan and E. O'Brien. “Blood pressure measurement anno 2016“.

[6]. R. N. Kundu, S. Biswas and M. Das. “Mean arterial pressure classification:A better tool for statistical interpretation of blood pressure related risk covariates“.

[7]. D. Yu, Z. Zhao and D. Simmons. “Interaction between mean arterial pressure and HbA1c in prediction of cardiovascular disease hospitalisation:A population-based case-control study“.

[8]. C. Wilson and C. A. Tisdell.“OLS and Tobit estimates:When is substitution defensible operationally?“In:

[9]. M. H. Odah, A. S. M. Bager and B. K. Mohammed. “Tobit regression analysis applied on Iraqi bank loans“.

[10]. A. Prahutama, A. Rusgiyono, M. A. Mukid and T. Widiharih. “Analysis of Household Expenditures on Education in Semarang City, Indonesia Using Tobit Regression Model“. In:

[11]. N. M. Ahmed. “Limited Dependent Variable Modelling (Truncated and censored Regression models) with Application“. Vol. 7377. Cambridge University Press, New York, pp. 82-96, 2018.

[12]. M. F. Ahmad, M. Ishtiaq, K. Hamid, M. U. Khurram and A. Nawaz. “Data envelopment analysis and Tobit analysis for firm efficiency in perspective of working capital management in manufacturing sector of Pakistan“.

[13]. S. Samsudin, A. S. Jaafar, S. D. Applanaidu, J. Ali and R. Majid. “Are public hospitals in Malaysia efficient?An application of DEA and Tobit analysis“.

[14]. M. H. Odah, A. S. M. Bager and B. K. Mohammed. “Studying the determinants of divortiality in Iraq. A two-stage estimation model with tobit regression“.

[15]. P. Zorlutuna, N. A. Erilli and B. Yücel. “Lung cancer study with tobit regression analysis:Sivas case“.

[16]. P. C. Anastasopoulos, A. P. Tarko and F. L. Mannering. “Tobit analysis of vehicle accident rates on interstate highways“.

[17]. A. Henningsen. “Estimating censored regression models in R using the censReg Package“.

[18]. A. C. Michalos.

[19]. M. H. Odah. “Asymptotic least squares estimation of tobit regression model. An application in remittances of Iraqi immigrants in Romania“.

[20]. C. Ekstrand and T. E. Carpenter. “Using a tobit regression model to analyse risk factors for foot-pad dermatitis in commercially grown broilers“.

[21]. J. S. Long.

[22]. A. Flaih, J. Guardiola, H. Elsalloukh and C. Akmyradov. “Statistical inference on the ESEP tobit regression model“.

[23]. B. R. Humphreys. “Dealing with zeros in economic data“.

[24]. K. A. M. Gajardo. “

[25]. W. H. Greene.