CN7022: Big Data Analytics – Analytics Using PySpark – Data Analytics Assignment Help

Responsive Centered Red Button

Need Help with this Question or something similar to this? We got you! Just fill out the order form (follow the link below), and your paper will be assigned to an expert to help you ASAP.

Internal Code : 3HIB
Data Analytics Assignment Help
Task: 
Understanding Dataset: UNSW-NB15
The raw network packets of the UNSW-NB15
1
dataset was created by the IXIA PerfectStorm
tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for
generating a hybrid of real modern normal activities and synthetic contemporary attack
behaviours. Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This
data set has nine types of attacks, namely,
Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms
. The Argus and Bro-IDS tools are used
and twelve algorithms are developed to generate totally 49 features with the class label.

a) The features are described
here
.

b) The number of records per traffic type are described
here
.

c) In this coursework, we use the total number of 2,540,044 records that was stored in 
the CSV file (
download
). The total size is 560MB, which is big enough to employ big
data methodologies for analysis. As a big data specialist, firstly, we would like to read 
and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import it into Hadoop HDFS, then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive 
This task is using Apache Hive for converting big raw data into useful information for end
users. To do so, firstly understand the dataset carefully. Then, make at least four Hive
queries to be able to get information from this big dataset. Apply appropriate visualization
tools to present your findings numerically and graphically. Interpret shortly your findings.
Finally, take screenshot of your scripts/codes into the report.

Tip:
the mark for this section depends on the level of Hive queries’ complexities, for instance
using simple
select
query is not supposed for full mark.

(3) Advanced Analytics using PySpark 

In this section, you will conduct advanced analytics using PySpark.

3.1. Analyze and Interpret Big Data 

a) We need to learn and understand the data through 3-4 descriptive analysis methods.
You need to present your work numerically and graphically. Apply tooltip text, legend,
title, X-Y labels etc. accordingly to help end-users for getting insights. 
b) Apply 3-4 advanced statistical analysis methods (e.g., correlation, hypothesis testing,
density estimation and so on) to interpret data precisely. You need to write down a
report of your methods, their configurations and interpret your findings. 

3.2. Design and Build a Classifier 

a) Design and build a binary classifier over the dataset. Explain your algorithm and its 
configuration. Explain your findings into both numerical and graphical representations.

b) How do you evaluate the performance of the model? 

c) How do you verify the accuracy and the effectiveness of your model? 

d) Apply a multi-class classifier to classify data into ten class: one normal and nine 
attack (e.g.,
Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance,
Shellcode and Worms
). Briefly explain your model with supportive statement on its 
parameters, accuracy and effectiveness.

(4) Individual Assessment 
Discuss (1) what did you learn from this coursework, (2) what other alternative technologies  
are available for tasks 2 and 3 and how they are differ (use academic references), and (3)
what was surprisingly new thinking evoked and/or neglected at your end?
Tip: add individual assessment of each member in a same report.

(5) Documentation 

Document all your work. Your final report must follow 5 sections detailed in the
“format of final submission”
section (refer to next page). Your work must demonstrate appropriate
understanding of academic writing and integrity.   
This CN7022: Statistics Assignment has been solved by our Statistics experts at TVAssignmentHelp. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction

How to create Testimonial Carousel using Bootstrap5

Clients' Reviews about Our Services