DRAFT.ipynb - Transform

DRAFT¶

Content in this section is based on general ML project pipeline¶

The tutorial is on the lithofacies predictions from well log data
It will run through a typical machine learning workflow/pipeline
- Getting data
- Data Cleaning
- Data Visualization and Exploratory Data ANalysis
- Data Preparation
- Model Training and Prediction
- Model Evaluation
- Model Performance Improvement with different techniques
  - Data Augmentation
  - Cross validation technique
  - Model regularization for better training (for slower weight decay)

Content in this section is focused more on the FORCE competition¶

It will cover key things to watch out for when working with big subsurface data in the format which we have (the FORCE dataset) e.g.
- When using gradient boosting trees (possibility of model loss not converging while training)
- How K-Fold cross validation could help prevent that
- How selecting a proper cross validation technique helps in making more confident decisions
- Preventing overfitting by;
  - Creating different validations sets (give "whys" on choice of method for creating them)
  - Making proper evaluations on validation sets with different metrics

T21 - Big Data Lithology