by Pablo Duboue, PhD
This book is structured into two parts. The first part presents feature engineering ideas and approaches that are as much domain independent as feature engineering can possibly be. The second part exemplifies different techniques in key domains through cases studies.
In one place summarizes dozens of blogs, videos, forum posts under a unified view and nomenclature. The book references more than 300 sources.
Helps the practitioner obtain better end-to-end performance than just tuning model parameters.
Helps the practitioner work with sets, lists, trees and graphs, traditionally problematic for statistical machine learning
Practitioners working on new domains can study solutions in other domains to help build new ones on their own. Note that each domain uses a different language and the book bridges this interdisciplinary barriers.
It helps readers compare techniques across domains as different as text and images. Instructors can reuse the dataset for their class examples.
The readers can look at the code for lower level details, the instructors can extend it and adapt it for their own classroom use.
This part focuses on domain independent techniques and overall process, where careful data analysis can steer practitioners away from bad assumptions and yield high-performing models.
Topics: machine learning cycle, f-measure, precision, recall, error analysis, feature ideation, feature creation, feature extraction, feature engineering, domain modelling, data preparation
Learn MoreTopics: normalization, binning, outliers, outlier detection, histogram, descriptive statistics, whitening, zca whitening, scaling, standardization
Learn MoreTopics: computable features, feature imputation, kernels, target rate encoding, one hot encoding, training expansion, tidy data
Learn MoreTopics: feature selection, feature utility, recursive feature elimination, ablation study, dimensionality reduction, lasso, elasticnet, embeddings, word2vec, non-negative matrix factorization
Learn MoreTopics: variable length feature vector, encoding lists, encoding sets, automated feature engineering, featuretools, deep learning, autoenconders
Learn MoreTapping into domain expertise allows to avoid known problems in a target domain. This parts seeks to learn from well understood domains to help practitioners tackle new, less understood domains.
Topics: graph data, machine learning on graphs, variable-length feature vector, one hot encoding examples, error analysis examples, exploratory data analysis examples, dbpedia, population prediction
Learn More Code and DataTopics: time stamped data, time series, machine learning for time series, lagged features, autorregressive models, moving averages, windowing features, historical population prediction, arma models, arch models
Learn More Code and DataTopics: natural language processing, information extraction, feature selection examples, mutual information, stemming, word embeddings, tsne, tf-idf, feature weighting
Learn More Code and DataTopics: image processing, satellite images, non-photographic satellite images, image feature extraction, histograms, image gradients, histogram of gradients, local feature extractors, corner detection, nuisance variations
Learn More Code and DataTopics: Video feature engineering, GIS feature engineering, feature engineering for preferences, geographical information system, high performance feature extraction, keypoints, preference imputation
Learn More Code and Data