Couple of weeks back, I had done a POC ( Proof of concept) & shared its learning in this blog. In this blog I wanted to highlight & share list of activities a functional consultant perform as part of ML assignment.
Most of us tend to think that ML assignments are purely technical with really less or no functional activities. I differ from this argument, hence wanted to outline functional consulting work a functional consultant has to perform as part of typical ML project/assignment.
Tasks highlighted below are confined to ML subject & does not include regular functional consulting work in typical SAP Activate project implementation. Also, below listed functional activities are more pertinent to projects/POC done from scratch & some of below activities may not relevant in case SAP delivered Best practice scope item. Ex – Predict PO delivery date(3FY) as these scope items are pre delivered by SAP.
Generally ML assignments from delivery perspective is lead by functional consultant with support from technical team. Active involvement of functional consultant is required right from understanding the pain point, realization & final deployment of model.
ML assignment is divided into following phases and l have listed functional task performed in each of these phases.
Business & data understanding
- Understand business background of problem is very important for a functional consultant to drive right solution. Consultant can leverage structured problem solving framework like “5W” – Who, What, When Where, and Why to understand full gamut of business problem. There are other framework & interviewing techniques to understand fundamentals of problem like 5W + 1H, Issue tree etc. A functional consultant is expected to lead all customer interactions & understand the pain points & draft scope of work.
- Data Source:
- Identify source of data & possible ways to extract the data. Onboarding of technical consultants depends a lot on this activity. Example – If data is exposed as CDS view then a report may need to be created to download data Or If it is available on cloud then web service developer may be handy in extracting data from cloud. A functional consultant plays an important role in onboarding right technical resources.
- Establish bridge/connection to data source & get connection credentials from customer.
- Check sensitivity of data – Is it a production data & whether data masking is required if data is confidential.
- Work with customer to get enough data for model to train & test. Data is heart of a model to learn and fetch accurate output. A functional consultant should always strive to get more & more data from customers.
- Data Dictionary:
- Functional consultant should understand data dictionary or metadata of dataset, if it is given by customer. If it is not given by customer please prepare a data dictionary and get it vetted by customer.
- Identify target/output variable(to be predicted) from data dictionary.
- Data dictionary typically contains business significance of each columns, Variables possible values/ranges & check importance of fields which might be helpful in predicting target variable
With this phase we hit the ground running with below activities.
- Identify missing values, check actual values in dataset are subset of list of values given in data dictionary. Perform basic sanity checks on dataset to find Junk values, Coded values, Invalid values etc.
- Missing value treatment – Check & discuss with customer, how to treat missing values with possible options listed below. While deleting rows/columns with high missing values please check whether the available dataset is scarce & remaining data is sufficient to predict target variable.
- Replace missing values with Mean/Median/Mode
- Delete columns/rows with missing values higher than certain threshold Ex – Delete columns/rows from dataset if missing value % > 30
- Feature Engineering – Based on business understanding ( Step 1 ), guide technical team to create new columns to capture additional insight from combination of other columns Example :- Groupby with functions like sum/mean/max/min etc., Extract date/day/time from timestamp columns.
- Outlier treatment – Outliers are data points which holds values less & more than lower band & higher band of box plot. Discuss & chalk strategy about treatment of outliers with customer as outliers bode lot of impact in model accuracy. Following option can be explored
- Delete rows containing outliers
- Replace outlier values with Mean/Median/Mode – While deleting rows please check whether the available dataset is scarce or sufficient to predict outcome.
Modeling is the process of identifying right model with appropriate hyperparameter to meet model evaluation criteria.
- Work with technical team & identify list of probable algorithms to be used.
- Explain & share pros & cons of various algorithms available to solve supervised or unsupervised learning problem. Ex -How random forest is better than decision tree. Random Forest is suitable for situations when we have a large dataset, and interpretability is not a major concern. Decision trees are much easier to interpret and understand.
- Freeze correct mix of test/train data set. If data is scarce, cross validation methods can be employed.
- There are multiple algorithms available to solve a given regression problem Ex – Linear regression Vs decision tree vs Random forest. Hence it is required to try out all models before freezing best fit model. It is important to share findings & results of all the model with customer to get there buy-in
Model Evaluation & deployment:
- Define model evaluation & project success criteria with customer Ex – Minimum Model AUC should be greater than 80%. Accuracy > 90% etc. Overall execution time < 90 Mins.
- There are various ways to evaluate model output Ex – Accuracy, Precision/Recall, sensitivity/ specificity, F1 score etc. Before freezing model evaluation methodology & success criteria, it is important to understand cost function in deciding which of the incorrect predictions can be more detrimental — the false positive or the false negative (in other words, which performance measure is important — precision or recall). Explain cost function to the customer & freeze right model evaluation methodology. Check Accuracy paradox for more details. Functional consultant along with customer can together outline threshold of model evaluation parameters to qualify a model for productive usage.
- Hyper parameter tuning is a iterative process between modeling & evaluation by virtue of which best fit model is identified. Share the hyperparameters with customers to find best fit model. Also explain consumption of time & resource if extensive ranges of hyperparameters are employed.
- Sometimes model tend to mug up train data and exhibit exceptional learning rate on train data & does not produce desired result on test data. This phenomenon is called overfitting, hence always test model on fresh unseen data before production deployment.
- There are multiple ways of model deployment & depends a lot on various factors like infrastructure, turn around time & other use cases, Hence it is important to consult pros & cons of below deployment options with customer before actual model deployment.
- On demand predictive service
- Batch processing mode
- As embedded services on IoT or edge devices.
- These activities were performed as part of SAP S/4HANA On-premise small size ML assignment,
- In this blog, I tried compiling functional task which are very much specific to ML assignment, These task are very much subject specific. However they can be included SAP Activate project implementation plan.
- Timelines & effort requirements to execute each task is subjective & may vary project to project. Hence timelines & efforts are not specified.
It may not be complete ML functional scope of work, feel free to add in comments if I have missed anything obvious.