Special Edition Data Science and Machine Learning Interview Questions Solved in Python and Sparkwith Deep Learning and Reinforcement Learning Bonus Questions
From the Back Cover:
Introduction 171. What are the most important machine learning techniques? 17Solution 172. Why is it important to have a robust set of metrics for machine learning? 18Solution 18Code 19Feature Engineering and the ETL Process (Extraction, Transformation, Loading) 203. Why are Features extraction and engineering so important in machine learning? 20Solution 204. Can you provide an example of features extraction? 22Solution 22Code 225. What is the mean, the variance, and the covariance? 23Solution 23Code 236. What are percentiles and quartiles? 24Solution 24Code 247. Why are vectors and norms used in machine learning? 24Solution 24Code 258. What is a convenient tool for performing data statistics? 25Solution 25Code 259. How is it convenient to visualize data statistics 26Solution 26Code 2610. How to compute covariance and correlation matrices with pandas 27Solution 27Code 2711. What is a TFxIDF? 28Solution 28Code 2912. What is "features hashing"? And why is it useful for BigData? 29Solution 2913. What is "continuous features binning"? 30Solution 3014. What is an LP normalization? 30Solution 30Code 3015. What is a Chi Square Selection? 31Solution 3116. What is mutual information and how can it be used for features selection? 31Solution 3117. How to deal with categorical features? And what is one-hot-encoding? 32Solution 32Code 3218. Can you transform an XML file into Python Pandas? 33Solution 33Code 3419. Can you read HTML into Python Pandas? 35Solution 35Code 3520. Can you read JSON into Python Pandas? 35Solution 35Code 3521. Can you draw a function from Python? 36Solution 36Code 3622. What is a Gaussian? 37Solution 37Code 3723. What is a Standard Scaling? 38Solution 38Code 3824. Why are statistical distributions important? 39Solution 39Code 4125. Can you compare your data with some distribution? What is a qq-plot? 41Solution 41Code 4126. Can you provide an example of connection to the Twitter API? 42Solution 42Code 4227. Can you provide an example of connection to the LinkedIn API? 44Solution 44Code 4428. Can you provide an example of connection to the Facebook API? 44Solution 44Code 4529. What is Parquet? 45Solution 45Code 45Machine learning basics 4630. What is a Bias - Variance tradeoff? 46Solution 4631. What is a training set, a validation set, a test set and a gold set in supervised and unsupervised learning? 47Solution 4732. What is a cross-validation and what is an overfitting? 48Solution 48Code 49Code 4933. Why is Grid Search important? 50Solution 50Code 50Spark and python 5234. What is an Ipython notebook? 52Solution 52Code 5235. What are Numpy, Scipy and Spark essential datatypes? 53Solution 53Code 5436. Can you provide an example for Map and Reduce in Spark? (Let's compute the Mean Square Error) 54Solution 54Code 5537. Can you provide examples for other computations in Spark? 56Solution 56Code 6038. How does Python interact with Spark 60Solution 6039. What is Spark support for Machine Learning? 61Solution 6140. How does Spark work in a parallel environment 61Solution 61Code 6141. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 62Solution 62Code 63Linear Models and Regression 6642. What is a loss function, what are linear models, and what do we mean by regularization parameters in machine learning? 66Solution 6643. What is an odd ratio? 68Solution 6844. What is a sigmoid function and what is a logistic function? 69Solution 69Code 7045. What is a Linear Least Square Regression? 70Solution 70Code 7146. What are Lasso, Ridge, and ElasticNet regularizations? 72Solution 7247. What is a Logistic Regression? 72Solution 72Code 7348. What is a stepwise regression? 74Solution 7449. What is an isotonic regression? 75Solution 75Code 7550. How to include nonlinear information into linear models 76Solution 7651. What are generalized linear models and what is an R Formula? 77Solution 77Code 7752. What is LARS? 78Solution 7853. What is GMLNET? 79Solution 79Optimization techniques 8154. What is a gradient descent? 81Solution 8155. What is a stochastic gradient descent? 82Solution 82Code 8356. What is momentum? 83Solution 8357. What is Conjugate Gradient? 84Solution 8458. What are Adagrad, RSMProp, Adam, and L-BFGS? 85Classification 8659. What is a Na´ve Bayes classifier? 86Solution 8660. What is a Bernoulli and a Multivariate Na´ve Bayes? 88Solution 88Code 8961. What is a Gaussian Na´ve Bayes? 90Solution 9062. What is another way to use Na´ve Bayes with continuous data? 90Solution 9063. What is the Nearest Neighbor classification? 90Solution 90Code 9264. What are Support Vector Machines (SVM)? 92Solution 92Code 9465. What are SVM Kernel tricks? 95Solution 9567. What is SVM with soft margins? 96Solution 9666. Can you provide an example for Text Classification with Spark? 96Solution 96Code 97Clustering 9867. What is K-Means Clustering? 98Solution 98Code 9968. What is the DBSCAN clustering algorithm? 99Solution 99Code 10069. What is a Streaming K-Means? 101Solution 101Code 10170. What is Canopi Clusterting? 102Solution 10271. What is Bisecting K-Means? 103Solution 10372. What is the Expectation Maximization Clustering algorithm? 103Solution 10373. What is a Gaussian Mixture? 105Solution 105Code 105Boosting and Ensembles 10774. What are the Ensembles? 107Solution 10775. What is an AdaBoost classification algorithm? 107Solution 107Decision Trees, Gradient Boosted Trees and Random Forests 10976. What are the Decision Trees? 109Solution 109Code 11177. What is a Gradient Boosted Tree? 112Solution 11278. What is a Gradient Boosted Trees Regressor? 112Solution 112Code 11279. Gradient Boosted Trees Classification 114Solution 114Code 11480. What is a Random Forest? 115Solution 115Code 115Recommendations 11781. What is a recommender system? 117Solution 11782. What is a collaborative filtering ALS algorithm? 118Solution 118Code 119Dimensional Reduction 12183. What is the PCA Dimensional reduction technique? 121Solution 121Code 12284. What is the SVD Dimensional reduction technique? 123Solution 123Code 12385. What is a Latent Semantic Analysis (LSA)? 124Solution 12486. What is the Latent Dirichlet Allocation topic model? 124Solution 124Code 125Associative Rules 12787. What is the Associative Rule Learning? 127Solution 12788. What is FP-growth? 129Solution 129Code 129Graph Mining 13089. Can you represent a graph in Python? 130Solution 130Code 13090. How to use the GraphX Library 130Solution 13091. What is PageRank? And how to compute it with GraphX 131Solution 131Code 132Code 13292. What is a Power Iteration Clustering? 134Solution 134Code 134Neural Networks 13593. What is a Perceptron? 135Solution 13594. What is an ANN (Artificial Neural Network)? 136Solution 13695. What are the activation functions? 137Solution 13796. How many types of Neural Networks are known? 13897. How to train a Neural Network 139Solution 13998. Which are the possible ANNs applications? 139Solution
"About this title" may belong to another edition of this title.