Assessment Task 2: Data exploration and preparation

Due date 11:59 pm Friday, 11 September 2020
Marks Out of 100, weighted to 35% of your final mark.
Submission format
A report in Adobe PDF (preferable) or MS Word format.
Please also upload the Excel spreadsheet containing
your results.
Filename
Report: ida_a2_xxxxxxxx.pdf or ida_a2_xxxxxxxx.doc
where xxxxxxxx is your student id.
Spreadsheet: ida_a2_xxxxxxxx.xls or
ida_a2_xxxxxxxx.xlsx.
You may need to zip files to submit to UTS Canvas.
Report format Around 20-25 pages with the information described
below. Use 11 or 12 point Times or Arial fonts.
Submit to UTS Canvas assignment submission button.
This assignment is individual work. Each of you will be working with an individual dataset that you
can download from the link below.
Scenario
You have just started working as a data miner/analyst in the Analytics Unit of a company. The
Head of the Analytics Unit has brought you a dataset [a welcome present ;-))]. The dataset
includes two files: a description of the attributes and a table with the actual values of these
attributes. The Head of the Analytics Unit has mentioned to you that this is some sort of
demographic data that a potential client has provided for analysis. The Head of the Analytics Unit
would like to have a report with some insights about that data, that he/she could deliver to the
client. Your tasks include:
understanding the specifics of the dataset;
extracting information about each of the attributes, possible associations between them and
other specifics of the dataset.
The tasks in the assignment are specified below.
Datasets
For this dataset you only have the attribute headings and a brief of what they mean, which you can
find here: Attribute_Description_Assignment2.docx . Each student is assigned an individual
table with the actual values of these attributes. You will find your individual dataset in this zip file:
Datasets.zip, with your student ID as the filename.
Tasks
1A. Initial data exploration
1. Identify the attribute type of each attribute in your dataset. If it’s not clear, you may need to
justify why you chose the type.
2. Identify the values of the summarising properties for the attributes, including frequency,
location and spread (e.g. value ranges of the attributes, frequency of values, distributions,
medians, means, variances, percentiles, etc. – the statistics that have been covered in the
lectures and materials given). Note that not all of these summary statistics will make sense for
all the attribute types, so use your judgement! Where necessary, use proper visualisations for
the corresponding statistics.
3. Using KNIME or other tools, explore your dataset and identify any outliers, clusters of similar
instances, “interesting” attributes and specific values of those attributes. Note that you may
need to ‘temporarily’ recode attributes to numeric or from numeric to nominal. In the report
include the corresponding snapshots from the tools and explanation of what has been
identified there.
Present your findings in the assignment report.
1B. Data preprocessing
Perform each of the following data preparation tasks (each task applies to the original data) using
your choice of tool:
1. Use the following binning techniques to smooth the values of the PRICE attribute:
equi-width binning
equi-depth binning.
In the assignment report, for each of these techniques you need to illustrate your steps. In your
Excel workbook file place the results in separate columns in the corresponding spreadsheet. Use
your judgement in choosing the appropriate number of bins – and justify this in the report.
2. Use the following techniques to normalise the attribute PRICE:
min-max normalization to transform the values onto the range [0.0-1.0].
z-score normalization to transform the values.
In the assignment report provide explanation about each of the applied techniques. In your Excel
workbook file place the results in separate columns in the corresponding spreadsheet.
3. Discretise the AYB attribute into the following categories: Very Old=0-1850; Old=1851-1950;
New=1951-2000; Very New= 2000+. Provide the frequency of each category in your dataset.
In the assignment report provide an explanation of each of the applied techniques. In your Excel
workbook file place the results in a separate column in the corresponding spreadsheet.
4. Binarise the CNDTN_D variable [with values “0” or “1”].
In the assignment report provide explanation about the applied binarisation technique. In your
Excel workbook file place the results in separate columns in the corresponding spreadsheet.
1C. Summary
At the end of the report include a summary section in which you summarise your findings. The
summary is not a narrative of what you have done, but a condensed informative section of what
you have found about the data that you should report to the Head of the Analytics Unit. The
summary may include the most important findings (specific characteristics (or values) of some
attributes, important information about the distributions, some clusters identified visually that you
propose to examine, associations found that should be investigated more rigorously, etc.).
Deliverables
The deliveries are:
A report, for which the structure should follow the tasks of the assignment, and
An Excel workbook file with individual spreadsheets for each task (spreadsheets should be
labeled according to the task names, for example, “1A”). Each of the results of parts (a)
through (d) in task 1B should be presented in a separate spreadsheet (and respectively table
in the assignment report).
In the report, include a section (starting with a section title) for each of the tasks in the assignment.
Your report will likely be 20-25 pages in length using an 11 or 12 point font, including title page and
graphs. On average you will require between 15 and 23 hours to complete this assignment.

Calculate the price of your order

Simple Order Process

Fill in the Order Form

Share all the assignment information. Including the instructions, provided reading materials, grading rubric, number of pages, the required formatting, deadline, and your academic level. Provide any information and announcements shared by the professor. Choose your preferred writer if you have one.

Get Your Order Assigned

Once we receive your order form, we will select the best writer from our pool of experts to fit your assignment.

Share More Data if Needed

You will receive a confirmation email when a writer has been assigned your task. The writer may contact you if they need any additional information or clarifications regarding your task

Let Our Essay Writer Do Their Job

Once you entrust us with your academic task, our skilled writers embark on creating your paper entirely from the ground up. Through rigorous research and unwavering commitment to your guidelines, our experts meticulously craft every aspect of your paper. Our process ensures that your essay is not only original but also aligned with your specific requirements, making certain that the final piece surpasses your expectations.

Quality Checks and Proofreading

Upon the completion of your paper, it undergoes a meticulous review by our dedicated Quality and Proofreading department. This crucial step ensures not only the originality of the content but also its alignment with the highest academic standards. Our seasoned experts conduct thorough checks, meticulously examining every facet of your paper, including grammar, structure, coherence, and proper citation. This comprehensive review process guarantees that the final product you receive not only meets our stringent quality benchmarks but also reflects your dedication to academic excellence.

Review and Download the Final Draft

If you find that any part of the paper does not meet the initial instructions, send it back to us with your feedback, and we will make the necessary adjustments.