What is feature engineering? Feature engineering is the practice of using mathematical transformations of raw input data to create new features to be used in an ML model. The following are examples of such transformations:
- Dividing total dollar amount by total number of payments to get a ratio of dollars per payment
- Counting the occurrence of a particular word across a text document
- Computing statistical summaries (such as mean, median, standard deviation and skew) of a distribution of user ping times to assess network health
- Joining two tables (for example, payments and support) on user ID
- Applying sophisticated signal-processing tools to an image and summarizing their output (for example, histogram of gradients)
Before diving into a few examples to demonstrate feature engineering in action, let’s consider a simple question: why use feature engineering? Five reasons to use feature engineering This section describes a few ways that feature engineering provides value in a machine-learning application. This list isn’t exhaustive, but rather introduces a few of the primary ways that feature engineering can boost the accuracy and computational efficiency of your ML models. Transform original data to relate to the target You can use feature engineering to produce transformations of your original data that are more closely related to the target variable. Take, for instance, a personal finance dataset that contains the current bank account balance and credit debt of each customer. If you are building a model to predict whether each customer will become delinquent in payments three months from now, then the engineered feature of Ratio of dept-to-balance = amount of dept / amount of balance would likely be highly predictive of the target. In this case, although the raw inputs are present in the original dataset, the ML model will have an easier time of finding the relationship between debt-to-balance ratio and future delinquency if the engineered feature is directly used as an input. This will result in improved accuracy of predictions. Bring in external data sources Feature engineering enables practitioners to bring external data sources into their ML models. Imagine that you run an internet subscription service. The first time each customer logs in, you want to predict the lifetime value of that customer. Among a variety of metrics, you could capture the geographic location of each user. Although this data could be fed in directly as categorical feature (for example, IP address or postal code), the model will likely have a difficult time determining the location-based signals that matter (in this case, those might be average income of each location, or urban versus rural). Use unstructured data sources Feature engineering enables you to use unstructured data sources in ML models. Unstructured data such as text, time series, images, video, log data, and clickstreams account for the vast majority of data that’s created. Featured engineering is what enables ML practitioners to produce ML feature vectors out of these kinds of raw data streams. Create features that are more easily interpreter Feature engineering empowers ML practitioners to create features that are more interpretable and actionable. Often, using ML to find patterns in data can be useful for making accurate predictions, but you may face limitations in the interpretability of the model and the ultimate utility of the model to drive changes. In these cases, it may be more valuable to engineer new features that are more indicative of the processes that drive the data generation and the link between the raw data and the target variable. Enhance creativity by using large sets of features Feature engineering empowers you to throw in large sets of features to see what sticks. You can create as many features as you can dream up and see which of them carries predictive power when thrown in to train a model.
- Machine Learning