Need Help with this Question or something similar to this? We got you! Just fill out the order form (follow the link below), and your paper will be assigned to an expert to help you ASAP.
I’m trying to study for my Statistics course and I need some help to understand this question.
i need a tutor to write a ten pages statistical data science report (image is also count into the ten page limit). I will provide referenced notes, coding, and output image to tutor. So, all you need to do is completing the writing part for this data.
#####below are the writing instruction and the dataset:
Students are to explore the Superconductivity data set also from (UCI Machine Learning Data Repository). The following is the description of this dataset (from UCI).“There are two files: (1) train.csv contains 81 features extracted from 21263 superconductors along with the critical temperature in the 82nd column. ….The goal here is to predict the critical temperature based on the features extracted.”Superconductivity data:https://archive.ics.uci.edu/ml/datasets/Supercondu…
#####below are the coding part, you can run this code via JupyterLab, if you want you may provide me an email, i can send the .py and .pny code file to you, that should be more easy.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.tree import DecisionTreeRegressor, export_graphviz, plot_tree
import warnings
warnings.filterwarnings(“ignore”)
pd.set_option(‘display.max_rows’, 100)
df = pd.read_csv(“train.csv”)
df.shape
## corrsponde variable
df_x = df.iloc[:,0:80]
df_y = df.iloc[:,81]
## The target variable is critical temperature
df.columns[-1]
plt.figure(figsize = (40, 10))
plt.plot(df.index, df.iloc[:, 81])
plt.xlabel(“index”)
plt.ylabel(“critical_temp”)
## check the data type for all variable and we may treat all of them as continuous
df.dtypes
# ## Preprocessing
# ### Standardization
# Standardize features by removing the mean and scaling to unit variance
#
# The standard score of a sample x is calculated as:
#
# z = (x – u) / s
from sklearn.preprocessing import StandardScaler
## Standardize the Data
scaler = StandardScaler()
scaler.fit(df)
# Apply transform to both the training set and the test set.
df_scale = scaler.transform(df)
df_scale = pd.DataFrame(df_scale)
df_scale.columns = df.columns
df_scale.head()
## correspond variable
df_x_scale = df_scale.iloc[:,0:81]
## response variable
df_y_scale = df_scale.iloc[:,81]
## train test split
x_scale_train, x_scale_test, y_scale_train, y_scale_test = train_test_split(df_x_scale, df_y_scale, test_size=0.3, random_state=42)
x_scale_train.shape
x_scale_test.shape
# ### PCA
# Principal component analysis (PCA).
# Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space
## PCA
from sklearn.decomposition import PCA
## apply the PCA, Notice the code below has .95 for the number of components parameter.
## It means that scikit-learn choose the minimum number of principal components such that 95% of the variance is retained
pca = PCA(0.95)
principalComponents = pca.fit_transform(df_x_scale)
df_x_pca = pd.DataFrame(principalComponents)
## train test split
x_pca_train, x_pca_test, y_pca_train, y_pca_test = train_test_split(df_x_pca, df_y_scale, test_size=0.3, random_state=42)
# ## Regression Tree
# Decision tree for regression but with different criterion.
#
# Lost function:
# $$L(t,v)= frac{n_{left}}{n} MSE_{left} + frac{n_{right}}{n} MSE_{right} $$
#
# MSE:
# $$frac{1}{|node|}sum_{x_i in node}(y_i-hat{y})^2$$
#
# Prediction:
# $$hat{y}=frac{1}{|node|}sum_{x_i in node}y_i$$
#
# the prediction is simply the average response value of the samples associated to this leaf node.
## Since our target variable is continuous then we fit a regression model with maximum depth 3
tree_reg_scale = DecisionTreeRegressor(max_depth=3)
tree_reg_scale.fit(x_scale_train, y_scale_train)
y_hat_tree_reg_scale = tree_reg_scale.predict(x_scale_test)
# Here we use the 5-folders-cross-validation to verify the model accuracy
# by returning the coefficient of determination R^2 of the prediction.
scale_score_list = cross_val_score(tree_reg_scale, df_x_scale, df_y_scale, cv = 5)
scale_score_list.mean()
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_tree_reg_scale, label = “predicted scaled critical_temp using decision tree”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
## calculate the test score
tree_reg_scale.score(x_scale_test, y_scale_test)
## The reason we have high score in test but low score in mean of the cross-validation is that data in
## some folder can not represent the true information of some features.
## for PCA applied on data
tree_reg_pca = DecisionTreeRegressor(max_depth=3)
tree_reg_pca.fit(x_pca_train, y_pca_train)
pca_score_list = cross_val_score(tree_reg_pca, df_x_pca, df_y_scale, cv = 5)
pca_score_list.mean()
## calculate the test score
tree_reg_pca.score(x_pca_test, y_pca_test)
# Compare the R^2 result for the regression tree, just use the standard scale is better than the PCA
## To visuallize the tree plot
## Here we can only use the tree model based on standscale data
plot_tree(tree_reg_scale, feature_names=df_x.columns, class_names=”critical_temp”, filled=True, fontsize=6)
def test_depth(depth, x, y):
scale_score = []
for i in depth:
tree_reg = DecisionTreeRegressor(max_depth=i)
# Here we use the 10-folders-cross-validation to verify the model accuracy
# by returning the coefficient of determination R^2 of the prediction.
scale_score_list =cross_val_score(tree_reg, x, y, cv = 5)
scale_score.append(scale_score_list.mean())
return scale_score
depth = range(2,20,2)
test_depth_score = test_depth(depth, df_x_scale, df_y_scale)
plt.plot(depth, test_depth_score)
plt.title(“Comapre different depth in regression tree vs R^2 score”)
plt.xlabel(“depth in regression tree”)
plt.ylabel(“R^2 score”)
# Here from the plot, we can find the high depth number is regression tree model may lead to a better model.
# But we know that it may cause the overfitting and the running time too long if the depth number too large. Hence we may select the depth number as 10 here and in the following analysis
# ## Ensemble Learning
#
# Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
#
# ### 1. Voting Classifier/Regressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import VotingRegressor
linear = LinearRegression(normalize=True)
tree_reg_scale = DecisionTreeRegressor(max_depth=10)
voting = VotingRegressor(estimators=[(‘LinearRegression’, linear),
(‘RegressionTree’, tree_reg_scale)])
voting.fit(x_scale_train, y_scale_train)
y_hat_voting = voting.predict(x_scale_test)
for reg in (linear, tree_reg_scale, voting):
cv_score=cross_val_score(reg, df_x_scale, df_y_scale, cv = 5)
print(reg.__class__.__name__, cv_score.mean())
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_voting, label = “predicted scaled critical_temp using voting”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
# Hence the Ensemble Learning (combine the linear regression and Regression tree) is better the the Regression tree model
# ### 2. Bagging
# Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression.
#
# Bagging introduces a bit more diversity in the subsets that each classifier is trained on, so it ends up with a slightly higher bias but it also reduces variance and helps to avoid overfitting.
#
# Out-Of-Bag (OOB):
#
# As we know, Bootstrap sample n instances with replacement, so only 63% of instances are sampled on average. The remaining 37% of instances that are not samples are called Out-Of-Bag instances.
from sklearn.ensemble import BaggingRegressor
bag = BaggingRegressor(DecisionTreeRegressor(max_depth=10),
n_estimators=30,
oob_score=True)
bag.fit(x_scale_train, y_scale_train)
y_hat_bag = bag.predict(x_scale_test)
bag_cv_score = cross_val_score(bag, df_x_scale, df_y_scale, cv=5)
print(bag_cv_score.mean(), bag_cv_score.std())
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_bag, label = “predicted scaled critical_temp using bag”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
# Without Cross-Validation or training-validation splitting, we can just use OOB for model evaluation.
bag2=BaggingRegressor(DecisionTreeRegressor(max_depth=10),
n_estimators=30,
bootstrap=True,
oob_score=True)
bag2.fit(df_x_scale, df_y_scale)
bag2.oob_score_
# Here OOB score is not very close to CV, since that we may have overfitting
# compare to the single tree or voting regressor, Bagging seems better
# ### 3.Random Forest
# Not onlt sampling instances like bagging, we can also randomly sample features. That means it is not necessary to use all the predictor for each regressor. Random Forests use this schedule with an ensemble of Decision Trees. Compared with Bagging, it introduces extra randomness and results in a greater tree diversity which once agian trades a higher bias for a lower variance. Besides, Random Forests train each decision train very deep, so the bias is kept extremely low.
from sklearn.ensemble import RandomForestRegressor
RF = RandomForestRegressor(n_estimators = 200,
criterion=’mse’,
# The maximum depth of the tree.
# If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
max_depth=None,
## If “sqrt”, then max_features=sqrt(n_features)
max_features=”sqrt”,
oob_score=True )
RF.fit(x_scale_train, y_scale_train)
y_hat_RF = RF.predict(x_scale_test)
RF_cv_score=cross_val_score(RF, df_x_scale, df_y_scale, cv=5)
print(RF_cv_score.mean(), RF_cv_score.std())
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_RF, label = “predicted scaled critical_temp using RF”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
RF = RandomForestRegressor(n_estimators=200,
criterion=’mse’,
max_depth=None,
max_features=”sqrt”,
oob_score=True )
RF.fit(df_x_scale, df_y_scale)
RF.oob_score_
# Feature importance
name_list =[]
importance_list =[]
for name, importance in zip(df.columns, RF.feature_importances_):
name_list.append(name)
importance_list.append(importance)
plt.figure(figsize = (10, 20))
plt.barh(name_list, importance_list, align=’center’)
# ### 4.Boosting
# #### 4.1 Adaboosting
# AdaBoost, short for “Adaptive Boosting”
from sklearn.ensemble import AdaBoostRegressor
# base learner is a one-depth decision tree
# train 30 sequential trees
ada = AdaBoostRegressor(DecisionTreeRegressor(max_depth=10), n_estimators=30)
ada.fit(x_scale_train, y_scale_train)
y_hat_ada = ada.predict(x_scale_test)
ada_cv_score = cross_val_score(ada, df_x_scale, df_y_scale, cv=5)
print(ada_cv_score.mean(), ada_cv_score.std())
## Here compare to the RandomForest, the RandomForest is silghtly better than AdaBoostRegressor
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_ada, label = “predicted scaled critical_temp using ada”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
## find the different of sequential trees in AdaBoost
def test_seq(seq, x, y):
ada_cv_score_list = []
for i in seq:
ada = AdaBoostRegressor(DecisionTreeRegressor(max_depth = 10), n_estimators = i)
ada_cv_score = cross_val_score(ada, x, y, cv = 5).mean()
ada_cv_score_list.append(ada_cv_score)
return ada_cv_score_list
seq = range(10, 50, 10)
seq_score = test_seq(seq, df_x_scale, df_y_scale)
plt.plot(seq, seq_score)
plt.title(“Comapre different number of sequential trees vs R^2 score”)
plt.xlabel(“number of sequential tree”)
plt.ylabel(“R^2 score”)
## So here 30 sequential trees may lead to a good model
# ### 4.2 Gradient Boosting
# It is a greedy optimization problem
from sklearn.ensemble import GradientBoostingRegressor
gbr = GradientBoostingRegressor(max_depth = 10, n_estimators = 30, learning_rate = 0.5)
gbr.fit(x_scale_train, y_scale_train)
y_hat_gbr = gbr.predict(x_scale_test)
gbr_cv_score = cross_val_score(gbr, df_x_scale, df_y_scale, cv = 5).mean()
gbr_cv_score
## There is a trade-off between learning_rate and n_estimators.
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_gbr, label = “predicted scaled critical_temp using gbr”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
## predict
gbr.fit(x_scale_train, y_scale_train)
gbr.score(x_scale_test, y_scale_test)
# ### 4.3 XGBoost
#
# XGBoost is similar to gradient boosting algorithm but it has a few tricks up its sleeve which makes it stand out from the rest.
#
# Features of XGBoost are:
#
# * Clever Penalisation of Trees
# * A Proportional shrinking of leaf nodes
# * Newton Boosting
# * Extra Randomisation Parameter
# XGBoost
from xgboost import XGBRegressor
# max_depth=10, learning_rate = 0.5, n_estimators=30
xgb = XGBRegressor(max_depth = 10, learning_rate = 0.5, n_estimators = 30)
xgb.fit(x_scale_train, y_scale_train)
y_hat_xgb = xgb.predict(x_scale_test)
xgb_cv_score = cross_val_score(xgb, df_x_scale, df_y_scale, cv = 5)
print(xgb_cv_score.mean(), xgb_cv_score.std())
plt.figure(figsize = (40, 10))
plt.plot(range(0, 6379), y_scale_test, label = “scaled critical_temp”)
plt.plot(range(0, 6379), y_hat_xgb, label = “predicted scaled critical_temp using xgb”)
plt.xlabel(“test index”)
plt.ylabel(“scaled critical_temp”)
plt.legend()
# Hence in this report, we can find that for a dataset, each model and different parameter with different approach to preprocess the data may lead to a different resultEnglish Thesis/Outline and 750 Word Essay: nursing case study help
I don’t know how to handle this English question and need guidance.
In Module/Week 5, you will write a 750-word (approximately 3pages) essay that analyzes 1poem from the Poetry Unit. Before you begin writing the essay, carefully read the below guidelines for developing your paper topic andreview the Poetry Essay Grading Rubric to see how your submission will be graded.
Once complete you should submit:
1. A 1 page thesis and detailed outline for the essay
2. The actual 750 word essay due
3. Send with turnitin report
4. Upload two separate documentsProject Network Diagram- Draw a project network (in Visio, Excel, or Word) given the information below
Help me study for my Business class. I’m stuck and don’t understand.
Exercise 8: You have signed a contract to build a garage for the Simpsons. You will receive a $500 bonus for completing the project within 15 working days. The contract also includes a penalty clause in which you will lose $100 for each day the project takes longer than 15 working days.
Draw a project network (in Visio, Excel, or Word) given the information below (or create a plan in MS project and make sure to complete the task name, start, finish, duration, predecessors, start slack, and finish slack columns.) If you are drawing the network diagram complete a forward and backward pass, compute the activity slack, and identify the critical path. If you are using MS project, add the “Critical” column to your project plan and then filter the plan based on the value in that field equaling “Y” to show the critical path. Save your work. I think it’s easiest to complete this project network diagram in MS project because the software does it for you once you put in the tasks, dependencies (predecessors,) durations, etc. But if you are not familiar with MS Project then using Visio or Excel might be easier for you. Fee free to ask me questions about this assignment. Answer the question “Do you expect to receive a bonus or a penalty on this project?” You can write the answer in a word document.
Submit both the Project Network Diagram and the word document containing your answer. Before submitting review the grading rubric below.
