Department of Economics Columbia University UN3412 Spring 2024 Problem Set 2 Introduction to Econometrics for all sections (due on Thursday, Feb 15th at 10 am) __________________________________________________________________________________________ Please make sure to select the page number for each question while you are uploading your solutions to Gradescope. Otherwise, it is hard to grade your answers, and you will lose 5 points. 1. (25p) In Problem Set 1, last week, you have calculated intercept and slope of the sample regression of lung cancer deaths in 1950 on cigarettes consumed per capita in 1930 for five countries given below: Observation # 1 2 3 4 5 Country Switzerland Finland Great Britain Canada Denmark Cigarettes consumed per capita in 1930 (X) 530 1115 1145 510 380 Lung cancer deaths per million people in 1950 (Y) 250 350 465 150 165 This week, please calculate the same statistics using STATA. On the STATA output file, find and label the items. (a) (3p) The sample means of X and Y, X and Y . (b) (3p) The standard deviations of X and Y, ?? and ?? . (c) (3p) The correlation coefficient, r, between X and Y (d) (4p) ?1 , the OLS estimated slope coefficient from the regression Yi = ?0 + ?1Xi + ui (e) (4p) ?0 , the OLS estimated intercept term from the same regression (f) (4p) Y , i = 1, , n, the predicted values for each country from the regression i (g) (4p) ui , the OLS residual for each country. STATA HINTS: First load STATA and type edit, which brings up something that looks like a spreadsheet. Enter the smoking and cancer values in the first two columns. Double1 click the column headers to enter variable names (e.g. smoke, death). Close the editor window when you are done. The following commands will be useful: list lists the data (to be sure you typed it in correctly) summarize computes sample means and standard deviations (the option ,detail gives additional statistics, including the sample variance) correlate produces correlation coefficients (with the option , covariance this command produces covariances) regress estimates regression by OLS predict compute OLS predicted values and residuals Note that STATA has on-line help. Do not be concerned if you do not yet understand all the statistics shown in the output we will discuss them in class in due course. 2. (10p) Using graph twoway command in STATA, graph the scatterplot of the five data points and the regression line. Interpret sample slope and sample intercept. If you are using R there is a similar command to graph these points. TAs will review it in the R recitations. 3. (28p) In a simple (one X) regression, when X is binary, ?1 is the difference between the expected value of ? when ? = 1 and the expected value of Y when ? = 0. You learned how to compute the variance of a difference in sample means for a large sample in chapter 3. In this question you will do it via the methods in chapter 3 and via regression. Find the Earn_Train dataset. One variable in this dataset is the average hourly earnings of a male worker who is just moving into the workforce after graduating from high school. The other is an indicator variable for which a value of 1 indicates that the student participated in a newly developed training program offered by the high school to prepare workers for work in the manufacturing of machine tools. (a) (3p) Using spreadsheet software or your statistical software, calculate the difference in means between those that had that training and those that did not. (b) (3p) If the students volunteered to be in the training program, would we expect this estimate of the difference to be an unbiased estimate of the causal effect of the training program? (A simple answer is sufficient.) (c) (3p) Continue with the scenario that the students volunteered to be in the training program. As is done in Angrist and Pischke chapter 1, assume that the treatment effect is the same for all individuals, a constant-effects assumption. With this assumption, what is actually being measured by ?1 ? 2 (d) (3p) Change the scenario to be that the students were randomly assigned to the training program and continue with this scenario for all of the remaining question parts. Calculate the sample variance for those students that did not participate in the training program. Separately, calculate the sample variance for those students that did participate in the training program. (e) (4p) Following the large-sample approach outline in Stock and Watson chapter 3, use these sample variance estimates to create an estimate of the variance and the standard deviation of the difference of the means. (f) (4p) Now follow the small-sample approach, also outlined in Stock and Watson chapter 3. Note that in order to make any headway at all, we must 1) assume normal distributions, 2) assume that the variances of the two groups are the same (with ?1 ? ?2 ), 3) use the pooled variance estimator, and 4) refer to the Student-t distribution with ?1 + ?2 2 degrees of freedom for inference. Nevertheless, take this opportunity to calculate the pooled variance. Start by calculating the sum of squares for each of the two groups. For each group in turn, take the sample variance calculated above and multiply it by ? – 1 for that group. Add the two sums of squares together and divide by ?1 + ?2 2. This is the sample variance, assumed to be common, of the observations. What is this number? (g) (4p) Now use this pooled variance number in place of ?02 and also in place of ?12 and calculate the variance and standard error of the difference of the means in the same way (using the same large-sample equation) that you did in part e. (h) (4p) Import the data and run the regression of earn on train in your statistical software. Copy and paste your work below, including both the call (command) to the software and ^1 match (or come very close to) the response (output). Did the standard error of your ? either the large-sample or the pooled variance estimate that you created? (i) No points, but impress your grader which can only help, never hurt, your score. Can you re-run the regression to produce (or come very close to) the other of the two variance estimates? 4. (22p) For many years, housing economists believed that households spend a constant fraction of income on housing, as in housing expenditure = ? (income) + u The file housing.dta contains housing expenditures (housing) and total expenditures (income) for a sample of 19th century Belgian workers collected by Edouard Ducpetiaux1. The differences in housing expenditures from one observation to the next are in the variables dhousing; the differences in total expenditures are in variable dincome. 1 Edouard Ducpetiaux, Budgets Economiques de Classes de Ouvrieres en Belgique (Brussels, Hayaz 1855) 3 (a) (3p) Compute the means of total expenditure and housing expenditure in this sample. (b) (3p) Estimate ? using total expenditure for total income. (i.e. estimate housing expenditure from total expenditure) (c) (4p) If income rises by 100 (it averages around 900 in this sample) what change in estimated expected housing expenditure results according to your estimate in (b)? (d) (4p) Interpret the R2 (e) (4p) What economic argument would you make against housing absorbing a constant share of income? (f) (4p) What are some determinants of housing captured by u? 5. (15p) The dataset sleep75.dta (sleep75.Rda for those of you using R) contains data from Biddle and Hammermesh (1990) that they used to study the tradeoff between time spent sleeping per week and the time spent in paid work. Lets start by studying the regression model ?????? = ?0 + ?1 ??????? + ?? Where sleep_i and totwrk (total work) are measured in minutes per week. (a) (4p) Run this regression. Interpret the coefficient on totwrk (i.e. use it in a sentence). (b) (3p) If someone works five more hours per week, by how many minutes is sleep predicted to fall? (c) (8p) Would you say that totwrk explains much of the variation in sleep? What other factors might affect the time spent sleeping? Are these likely to be correlated with totwrk? Following questions will not be graded, they are for you to practice and will be discussed at the recitation: 6. [Practice question, not graded] SW Exercise 4.1 7. [Practice question, not graded] SW Empirical Exercise 4.2 8. [Practice question, not graded] SW Empirical Exercise 5.1 4

**We offer the best custom writing paper services. We have answered this question before and we can also do it for you.**

**GET STARTED TODAY AND GET A 20% DISCOUNT coupon code DISC20**