본문 바로가기

카테고리 없음

[패스트캠퍼스 수강 후기] 데이터분석 인강 100% 환급 챌린지 39회차 미션



50. ch06. sklearn - 앙상블 - 10. 블렌딩(Weighted Blending) - 54. ch06. sklearn - 앙상블 - 14. 하이퍼파라미터 튜닝


50. ch06. sklearn - 앙상블 - 10. 블렌딩(Weighted Blending)

각 모델의 예측값에 대하여 weight를 곱하여 최종 output 계산

final_outputs = {
'elasticnet' : poly_pred,
'randomforest' : rfr_pred,
'gbr' : gbr_pred,
'xgb' : xgb_pred,
'lgbm' : lgbm_pred,
'stacking' : stack_pred,
}




final_prediction=\
final_outputs['elasticnet'] * 0.1\
*final_outputs['randomforest'] * 0.1\
*final_outputs['gbr'] * 0.2\
*final_outputs['xgb'] * 0.2\
*final_outputs['lgbm'] * 0.2\
*final_outputs['stacking'] * 0.2\


mse_eval('Weighted Blending', final_prediction, y_test)



51. ch06. sklearn - 앙상블 - 11. 앙상블 모델 총평


52. ch06. sklearn - 앙상블 - 12. 교차 검증(Cross Validation)

K-겹 교차 검증은 모든 데이터가 최소 한 번은 테스트셋으로 쓰이도록

from sklearn.model_selection import KFold

n_splits = 5 (5개로 나누기)
kfold = KFold(n_splits=n_splits, random_state=42)

df.head()

X = np.array(df.drop('MEDV', 1))
Y = np.array(df['MEDV'])

lgbm_fold = LGBMRegressor(random_state=42)



i = 1
total_error = 0
for train_index, test_index in kfold.split(X):
x_train_fold, x_test_fold = X[train_index], X[test_index]
y_train_fold, y_test_fold = Y[train_index], Y[test_index]
lgbm_pred_fold = lgbm_fold.fit(x_train_fold, y_train_fold).predict(x_test_fold)
error = mean_squared_error(lgbm_pred_fold, y_test_fold)
print('Fold = {}, prediction score = {:.2f}'.format(i, error))
total_error += error
i+=1
print('---'*10)
print('Average Error: %s' % (total_error / n_splits))




53. ch06. sklearn - 앙상블 - 13. 하이퍼파라미터 튜닝


RandomizedSearchCV


지정된 분포에서 고정 된 수의 매개 변수 설정이 샘플링


params = {
'n_estimators' : [200, 500, 1000, 2000],
'learning_rate' : [0.1, 0.05, 0.01],
'max_depth' : [6, 7, 8],
'colsample_bytree' : [0.8, 0.9, 1.0],
'subsample' : [0.8, 0.9, 1.0],
}


from sklearn.model_selection import RandomizedSearchCV


clf = RandomizedSearchCV(LGBMRegressor(), params, random_state=42, cv=3, n_iter=25, scoring='neg_mean_squared_error')


clf.fit(x_train, y_train)



clf.best_score_

clf.best_params_
>{'colsample_bytree': 0.8,
'learning_rate': 0.01,
'max_depth': 7,
'n_estimators': 2000,
'subsample': 0.8}

이대로 대입.


lgbm_best = LGBMRegressor(n_estimators=2000, subsample=0.8, max_depth=7, learning_rate=0.01, colsample_bytree=0.8)
lgbm_best_pred = lgbm_best.fit(x_train, y_train).predict(x_test)
mse_eval('RandomSearch LGBM', lgbm_best_pred, y_test)


54. ch06. sklearn - 앙상블 - 14. 하이퍼파라미터 튜닝


GridSearchCV

- 모든 매개 변수 값에 대하여 완전 탐색을 시도


params = {
'n_estimators': [500, 1000],
'learning_rate': [0.1, 0.05, 0.01],
'max_depth': [7, 8],
'colsample_bytree': [0.8, 0.9],
'subsample': [0.8, 0.9,],
}



from sklearn.model_selection import GridSearchCV


grid_search = GridSearchCV(LGBMRegressor(), params, cv=3, n_jobs=-1, scoring='neg_mean_squared_error')

grid_search.fit(x_train, y_train)

grid_search.best_score_

grid_search.best_params_




lgbm_best = LGBMRegressor(n_estimators=500, subsample=0.8, max_depth=8, learning_rate=0.1, colsample_bytree=0.9)
lgbm_best_pred = lgbm_best.fit(x_train, y_train).predict(x_test)
mse_eval('GridSearch LGBM', lgbm_best_pred, y_test)



패스트캠퍼스 데이터분석 강의 링크
bit.ly/3imy2uN