data analyst

You are a data analyst for a basketball team and have access to a large set of historical data that you can use to analyze performance patterns. The coach of the team and your management have requested that you come up with regression models that predict the number of wins in a regular game based on the performance metrics that are included in the data set. These regression models will help make key decisions to improve the performance of the team. You will use the Python programming language to perform the statistical analyses and then prepare a report of your findings to present for the team’s management. Since the managers are not data analysts, you will need to interpret your findings and describe their practical implications.

Note: This data set has been “cleaned” for the purposes of this assignment.

Reference

FiveThirtyEight. (April 26, 2019). FiveThirtyEight NBA Elo dataset. Kaggle. Retrieved from https://www.kaggle.com/fivethirtyeight/fivethirtyeight-nba-elo-dataset/Directions

For this project, you will submit the Python script you used to make your calculations and a summary report explaining your findings.

  1. Python Script: To complete the tasks listed below, open the Project Three Jupyter Notebook link in the Assignment Information module.  This notebook contains your data set and the Python scripts for your project. In the notebook, you will find step-by-step instructions and code blocks that will help you complete the following tasks:
    • Simple Linear Regression
      • Create scatterplots
      • Compute the correlation coefficient
      • Conduct a linear regression
    • Multiple Regression
      • Create scatterplots
      • Compute the correlation matrix
      • Conduct a multiple regression analysis
  2. Summary Report: Once you have completed all the steps in your Python script, you will create a summary report to present your findings. Use the provided template to create your report. You must complete each of the following sections:
    • Introduction: Set the context for your scenario and the analyses you will be performing.
    • Scatterplots and Correlation: Discuss relationships between variables using scatterplots and correlation coefficients.
    • Simple Linear Regression: Create a simple linear regression model to predict the response variable.
    • Multiple Regression: Create a multiple regression model to predict the response variable.
    • Conclusion: Summarize your findings and explain their practical implications.

What to Submit

To complete this project, you must submit the following:

Python ScriptYour Jupyter Notebook Python script contains all the statistical analyses you completed for this project. You downloaded your work as an HTML file. Review the file to make sure that every step and all your outputs are included. Submit the HTML file as part of your submission. Review the Jupyter Notebook in Codio Tutorial in the Supporting Materials section if you need help.

Summary ReportUse the provided template to create your summary report. The template contains guiding questions to help you complete each section. Be sure to remove these questions before submitting your report. Your summary report should be submitted as a 3- to 5-page Microsoft Word document. It should include an APA-style cover page and APA citations for any sources used. Use double spacing, 12-point Times New Roman font, and one-inch margins.

Python Codes:

Step 1: Generating sample data

This block of Python code will generate a unique sample of size 50 that you will use in this discussion. Note that your sample will be unique and therefore your answers will be unique as well. The numpy module in Python allows you to create a data set using a Normal distribution. Note that the mean and standard deviation were chosen for you. The data set will be saved in a Python dataframe that will be used in later calculations.

Click the block of code below and hit the Run button above.

In [1]:import pandas as pdimport numpy as npimport mathimport scipy.stats as st​# create 50 randomly chosen values from a Normal distribution. (arbitrarily using mean=2.48 and standard deviation=0.50). diameters = np.random.normal(2.4800,0.500,50)​# convert the array into a dataframe with the column name “diameters” using pandas library.diameters_df = pd.DataFrame(diameters, columns=[‘diameters’])diameters_df = diameters_df.round(2)​# print the dataframe (note that the index of dataframe starts at 0).print(“Diameters data frame\n”)print(diameters_df)Diameters data frame    diameters 0        2.10 1        2.77 2        2.74 3        2.55 4        2.58 5        2.50 6        2.96 7        2.74 8        1.77 9        2.97 10       2.62 11       2.46 12       2.40 13       0.90 14       2.07 15       2.08 16       1.73 17       2.11 18       1.93 19       2.06 20       2.10 21       2.57 22       1.51 23       2.76 24       2.41 25       3.17 26       2.41 27       2.96 28       2.26 29       2.43 30       2.19 31       2.14 32       2.48 33       0.96 34       2.05 35       2.29 36       2.96 37       2.37 38       2.06 39       2.29 40       2.66 41       2.54 42       2.80 43       1.99 44       2.07 45       1.78 46       3.84 47       2.39 48       3.20 49       2.79 Step 2: Constructing confidence intervals

You will assume that the population standard deviation is known and that the sample size is sufficiently large. Then you will use the Normal distribution to construct these confidence intervals. You will use the submodule scipy.stats to construct confidence intervals using your sample data.

Click the block of code below and hit the Run button above.

In [3]:# Python methods that calculate confidence intervals require the sample mean and the standard error as inputs.​# calculate the sample meanmean = diameters_df[‘diameters’].mean()​# input the population standard deviation, which was given in Step 1.std_deviation = 0.5000​# calculate standard error = standard deviation / sqrt(n)   where n is the sample size.stderr = std_deviation/math.sqrt(len(diameters_df[‘diameters’]))​# construct a 90% confidence interval.conf_int_90 = st.norm.interval(0.90, mean, stderr)print(“90% confidence interval (unrounded) =”, conf_int_90)print(“90% confidence interval (rounded) = (“, round(conf_int_90[0], 2), “,”, round(conf_int_90[1], 2), “)”)print(“”)​# construct a 99% confidence interval.conf_int_99 = st.norm.interval(0.99, mean, stderr)print(“99% confidence interval (unrounded) =”, conf_int_99)print(“99% confidence interval (rounded) = (“, round(conf_int_99[0], 2), “,”, round(conf_int_99[1], 2), “)”)90% confidence interval (unrounded) = (2.2530912846323328, 2.4857087153676676) 90% confidence interval (rounded) = ( 2.25 , 2.49 ) 99% confidence interval (unrounded) = (2.1872613632281555, 2.551538636771845) 99% confidence interval (rounded) = ( 2.19 , 2.55 ) Step 3: Performing hypothesis testing for the population mean

Since you were given the population standard deviation in Step 1 and the sample size is sufficiently large, you can use the z-test for population means. The z-test method in statsmodels.stats.weightstats submodule runs the z-test. The input to this method is the sample dataframe and the value under the null hypothesis. The output is the test-statistic and the two-tailed P-value.

Click the block of code below and hit the Run button above.

In [4]:from statsmodels.stats.weightstats import ztest​# run z-test hypothesis test for population mean. The value under the null hypothesis is 2.30.test_statistic, p_value = ztest(x1 = diameters_df[‘diameters’],  value = 2.30)​print(“z-test hypothesis test for population mean”)print(“test-statistic =”, round(test_statistic,2))print(“two tailed p-value =”,round(p_value,4))z-test hypothesis test for population mean test-statistic = 0.94 two tailed p-value = 0.3481

Hide 

Calculate the price of your order

Simple Order Process

Fill in the Order Form

Share all the assignment information. Including the instructions, provided reading materials, grading rubric, number of pages, the required formatting, deadline, and your academic level. Provide any information and announcements shared by the professor. Choose your preferred writer if you have one.

Get Your Order Assigned

Once we receive your order form, we will select the best writer from our pool of experts to fit your assignment.

Share More Data if Needed

You will receive a confirmation email when a writer has been assigned your task. The writer may contact you if they need any additional information or clarifications regarding your task

Let Our Essay Writer Do Their Job

Once you entrust us with your academic task, our skilled writers embark on creating your paper entirely from the ground up. Through rigorous research and unwavering commitment to your guidelines, our experts meticulously craft every aspect of your paper. Our process ensures that your essay is not only original but also aligned with your specific requirements, making certain that the final piece surpasses your expectations.

Quality Checks and Proofreading

Upon the completion of your paper, it undergoes a meticulous review by our dedicated Quality and Proofreading department. This crucial step ensures not only the originality of the content but also its alignment with the highest academic standards. Our seasoned experts conduct thorough checks, meticulously examining every facet of your paper, including grammar, structure, coherence, and proper citation. This comprehensive review process guarantees that the final product you receive not only meets our stringent quality benchmarks but also reflects your dedication to academic excellence.

Review and Download the Final Draft

If you find that any part of the paper does not meet the initial instructions, send it back to us with your feedback, and we will make the necessary adjustments.