For this week you should read Chapter 14 of the textbook, excluding Section 14.4 and Questions 14.5 and 14.6 of Section 14.5. When you read the text that involves running R script you are expected to
For this week you should read Chapter 14 of the textbook, excluding Section 14.4 and Questions 14.5 and 14.6 of Section 14.5. When you read the text that involves running R script you are expected to run the code by yourself on your computer, in parallel to reading it in the textbook, and compare what you get with the output presented in the textbook.
part 1
Some statisticians prefer complex models, models that try to fit the data as closely as one can. Others prefer a simple model. They claim that although simpler models are more remote from the data yet they are easier to interpret and thus provide more insight.
What do you think? Which type of model is best to use?
When formulating your answer to this question you may think of a situation that involves inference that you do and need to present to other people. Would the consumers of your analysis benefit more from you having used a complex model of from yo having used a simpler model? What would be the best way to report your findings and explain them to the consumers?
part 2
For the assignment you should complete the following 8 tasks. Tasks 1-3 refer to the problem of comparing two samples and Tasks 4-7 refer to regression analysis. In Task 8 the relation between two variables is investigated. Your answers should be short and clear. We recommend that you copy and paste the tasks below into the form titled “Submit your Assignment using this Form”.
You can then write you answers to the tasks in the designated positions that are marked in the text: Tasks Comparing Two Samples:
1. Apply the function “plot” to the formula that relates the response “frequency” to the explanatory variable “march2007” in order to produce the two box-plots of the response. Redo the plotting with “frequency” replaced by “log(frequency)”. The distribution of the variable “log(frequency)” is:__ More symmetric, __ Less symmetric compared to the distribution of the variable “frequency”. Mark the most appropriate option and attach the R code that produces the two plots:
2. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:(Reject/Don’t Reject) H0: The expectation of “frequency” is the same in the two subsets,(Reject/Don’t Reject) H0: The expectation of “log(frequency)” is the same in the two subsets. Explain your answer:
3. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:(Reject/Don’t Reject) H0: The variance of “frequency” is the same in the two subsets,(Reject/Don’t Reject) H0: The variance of “log(frequency)” is the same in the two subsets. Explain your answer:
Linear Regression:
4. Apply the function “plot” to the formula that relates the response “frequency” to the explanatory variable “time” in order to produce the scatter plot. Add the regression line to the plot. The variability of the variable “frequency, for larger values of the explanatory variable, is:__ Smaller, __ Larger, __ Constant.Mark the most appropriate option and attach the R code that produces the two plots:
5. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:(Reject/Don’t Reject) H0: The slope of “time” in the regression line of the response “frequency” is equal to zero,(Reject/Don’t Reject) H0: The slope of “time” in the regression line of the response “log(frequency)” is equal to zero. Explain your answer:
6. The 95%-confidence interval of slope of “time” in the regression line of the response “log(frequency)” is: Lower end = ____, Upper end = ____.Attach the R code that produces the confidence interval:
7. The regression line between “time” as an explanatory variable and “log(frequency)” as a response is:__ Increasing, __ Decreasing, __ Constant.Mark the most appropriate option and explain your answer:
The Relation Between Two Variables:
8. Apply the function “plot” to the formula that relates the response “frequency” to the explanatory variable “monetary” in order to produce the scatter plot. Add the regression line to the plot. The points in the scatter plot are:__ All on the same line, __ Show a linear trend but are not on the same line, __ Don’t show a linear trend. Mark the most appropriate option and attach the R code that produces the plot:
part 3
The Learning Journal is a tool for self-reflection on the learning process. In addition to completing directed tasks, you should use the Learning Journal to document your activities, record problems you may have encountered and to draft answers for Discussion Forums and Assignments.
Your learning journal entry must be a reflective statement that considers the following questions:
1. Describe what you did. This does not mean that you copy and paste from what you have posted or the assignments you have prepared. You need to describe what you did and how you did it.
2. Describe your reactions to what you did.
3. Describe any feedback you received or any specific interactions you had. Discuss how they were helpful.
4. Describe your feelings and attitudes.
5. Describe what you learned.
Another set of questions to consider in your learning journal statement include:
1. What surprised me or caused me to wonder?
2. What happened that felt particularly challenging? Why was it challenging to me?
3. What skills and knowledge do I recognize that I am gaining?
4. What am I realizing about myself as a learner?
5. In what ways am I able to apply the ideas and concepts gained to my own experience?
Finally, describe one important thing that you are thinking about in relation to the activity.
Your Learning Journal should be a minimum of 500 words.