INFO411/911: Data Mining – R-script Files – Visualization and Clustering Techniques – IT Assessment Answer

Responsive Centered Red Button

Need Help with this Question or something similar to this? We got you! Just fill out the order form (follow the link below), and your paper will be assigned to an expert to help you ASAP.

IT Assessment Answer
Task: 1
The analysis of results from urban mobility simulations can provide very valuable information for the identification and addressing of problems in an urban road network. Public transport vehicles such as busses and taxis are often equipped with GPS location devices and the location data is submitted to a central server for analysis.
The metropolitan city of Rome, Italy collected location data from 320 taxi drivers that work in the center of Rome. Data was collected during the period from 01/Feb/2014 until 02/March/2014. An extract of the dataset is found in taxi.csv. The dataset contains 4 attributes:

ID of a taxi driver. This is a unique numeric ID.

Date and time in the format Y:m:d H:m:s.msec+tz, where msec is micro-seconds, and tz is a timezone adjustment. (You may have to change the format of the date into one that R can understand).

Latitude

Longitude

For a further description of this dataset: http://crawdad.org/roma/taxi/20140717/
Questions:
By using the data in taxi.csv perform the following tasks:
(a) Plot the location points (2D plot using all of the latitude,longitude value pairs in the dataset). Clearly indicate points that are invalid, outliers or noise points. The plot should be informative! Clearly explain the rationale that you used when identifying invalid points, noise points, and outliers. Remove invalid points, outliers and noise points before answering the subsequent questions.
(b) Compute the minimum, maximum, and mean location values.
(c) Obtain the most active, least active, and average activity of the taxi drivers (most time driven, least time driven, and mean time driven) . Explain the rationale of your approach and explain your results.
(d) Look at the file Student_Taxi_Mapping.txt. The file contains two columns. The first column is a 4- digit code, the 2nd column is the ID of a taxi driver. Use the first and last three digits of your student number to optain a 4-digit code. Locate that code in the first column of the file Student_Taxi_Mapping.txt then use the corresponding ID of the taxi driver listed in column 2. Thus, for example, if your student number is
52345856 then you would look up 5856 in file Student_Taxi_Mapping.txt to find that the corresponding taxi ID is 50. Use the taxi ID that is listed next to your 4-digit student code to answer the following questions:

Plot the location points for taxi=ID

Compare the mean, min, and max location value of taxi=ID with the global mean, min, and max.

 Compare total time driven by taxi=ID with the global mean, min, and max values.

 Compute the distance traveled by taxi=ID. To compute the distance between two points on the surface of the earth use the following method:
dlon = longitude2 – longitude1
dlat = latitude2 – latitude1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
distance = R * c (where R is the radius of the Earth)
Assume that R=6,371,000 meters.
With each of your answers: Explain what knowledge can be derived from your answer.

Task: 2
Preface: Banks are often posed with a problem to whether or nor a client is credit worthy. Banks commonly employ data mining techniques to classify a customer into risk categories such as category A (highest rating) or category C (lowest rating).
A bank collects data from past credit assessments. The file “creditworthiness.csv” contains 2500 of such assessments. Each assessment lists 46 attributes of a customer. The last attribute (the 47-th attribute) is the result of the assessment. Open the file and study its contents. You will notice that the columns are coded by numeric values. The meaning of these values is defined in the file “definitions.txt”. For example, a value 3 in the 47-th column means that the customer credit worthiness is rated “C”. Any value of attributes not listed in definitions.txt is “as is”.
This poses a “prediction” problem. A machine is to learn from the outcomes of past assessments and, once the machine has been trained, to assess any customer who has not yet been assessed. For example, the value in column 47 indicates that this customer has not yet been assessed.
Question 1: 
Analyse the general properties of the dataset and obtain an insight into the difficulty of the prediction task. Create a statistical analysis of the attributes and their values, then list 5 of the most interesting (most valuable) attributes. Explain the reasons that make these attributes interesting.
Note: A set of R-script files are provided with this assignment (included in the zip-file). These are similar to the scripts used in labs. The scripts provided will allow you to produce some first results. However, virtually none of the parameters used in these scripts are suitable for obtaining a good insight into the general properties of the given dataset. Hence your task is to modify the scripts such that informative results can be obtained from which conclusions about the learning problem can be made. Note that finding a good set of parameters is often very time consuming in data mining.
An additional challange is to make a correct interpretation of the results. This is what you need to do: Find a good set of parameters (i.e. through a trial and error approach), obtain informative results then offer an interpretation of the results. Write down your approach to conducting the experiments, explain your results, and offer a comprehensive interpretation of the results. Do not forget that you are also to provide an insight into the degree of difficulty of this learning problem (i.e. from the results that you obtained, can it be expected that a prediction model will be able to achieve a 100% prediction accuracy?). Always explain your answers.
Question 2: 
Deploy a prediction model to predict the credit worthiness of customers which have not yet been assessed. The prediction capabilities of the MLP in lab4 was very poor. Your task is to:
a.) Describe a valid strategy that maximises the accuracy of predicting the credit rating. Explain why your strategy can be expected to maximize the prediction capabilities.
b.) Use your strategy to train MLP(s) then report your results. Give an interpretation of your results. What is the best classification accuracy (expressed in % of correctly classified data) that you can obtain for data that were not used during training (i.e. the test set)?
c.) You will find that 100% accuracy cannot be obtained on the test data. Explain reasons to why a 100% accuracy could not be obtained on this test dataset. What would be needed to get the prediction accuracy closer to 100%?
This INFO411/911: Data Mining Assignment has been solved by our IT experts at TVAssignmentHelp. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

How to create Testimonial Carousel using Bootstrap5

Clients' Reviews about Our Services