BigData and Machine Learning in Python and Spark
"synopsis" may belong to another edition of this title.
Table of Content1. What are the most important machine learning techniques? 10Solution 102. Why is it important to have a robust set of metrics for machine learning? 11Solution 11Code 123. Why are Features extraction and engineering so important in machine learning? 12Solution 124. Can you provide an example of features extraction? 14Solution 14Code 145. What is a training set, a validation set, a test set and a gold set in supervised and unsupervised learning? 15Solution 156. What is a Bias - Variance tradeoff? 16Solution 167. What is a cross-validation and what is an overfitting? 17Solution 17Code 188. Why are vectors and norms used in machine learning? 18Solution 18Code 199. What are Numpy, Scipy and Spark essential datatypes? 19Solution 19Code 2010. Can you provide an example for Map and Reduce in Spark? (Let's compute the Mean Square Error) 20Solution 20Code 2111. Can you provide examples for other computations in Spark? 22Solution 22Code 2512. How does Python interact with Spark 26Solution 2613. What is Spark support for Machine Learning? 26Solution 2614. How does Spark work in a parallel environment 27Solution 27Code 2715. What is the mean, the variance, and the covariance? 27Solution 27Code 2816. What are percentiles and quartiles? 28Solution 28Code 2817. Can you transform an XML file into Python Pandas? 29Solution 29Code 2918. Can you read HTML into Python Pandas? 30Solution 30Code 3019. Can you read JSON into Python Pandas? 31Solution 31Code 3120. Can you draw a function from Python? 31Solution 31Code 3121. Can you represent a graph in Python? 32Solution 32Code 3222. What is an Ipython notebook? 33Solution 33Code 3323. What is a convenient tool for performing data statistics? 34Solution 34Code 3424. How is it convenient to visualize data statistics 35Solution 35Code 3525. How to compute covariance and correlation matrices with pandas 36Solution 36Code 3626. Can you provide an example of connection to the Twitter API? 37Solution 37Code 3727. Can you provide an example of connection to the LinkedIn API? 39Solution 39Code 3928. Can you provide an example of connection to the Facebook API? 39Solution 39Code 4029. What is a TFxIDF? 40Solution 40Code 4030. What is "features hashing"? And why is it useful for BigData? 41Solution 4131. What is "continuous features binning"? 42Solution 4232. What is an LP normalization? 42Solution 42Code 4233. What is a Chi Square Selection? 42Solution 4234. What is mutual information and how can it be used for features selection? 43Solution 4335. What is a loss function, what are linear models, and what do we mean by regularization parameters in machine learning? 43Solution 4336. What is an odd ratio? 4637. What is a sigmoid function and what is a logistic function? 46Code 4738. What is a gradient descent? 47Solution 4739. What is a stochastic gradient descent? 49Solution 49Code 4940. What is a Linear Least Square Regression? 50Solution 50Code 5141. What are Lasso, Ridge, and ElasticNet regularizations? 52Solution 5242. What is a Logistic Regression? 52Solution 52Code 5343. What is a stepwise regression? 54Solution 5444. How to include nonlinear information into linear models 54Solution 5445. What is a Naïve Bayes classifier? 55Solution 5546. What is a Bernoulli and a Multivariate Naïve Bayes? 57Solution 57Code 5847. What is a Gaussian? 59Solution 59Code 5948. What is a Standard Scaling? 60Solution 60Code 6049. Why are statistical distributions important? 61Solution 61Code 6350. Can you compare your data with some distribution? What is a qq-plot? 63Solution 63Code 6351. What is a Gaussian Naïve Bayes? 64Solution 6452. What is another way to use Naïve Bayes with continuous data? 64Solution 6453. What is the Nearest Neighbor classification? 65Solution 65Code 6654. What are Support Vector Machines (SVM)? 66Solution 66Code 6855. What are SVM Kernel tricks? 68Solution 6856. What is K-Means Clustering? 70Solution 70Code 7157. Can you provide an example for Text Classification with Spark? 71Solution 71Code 7158. Where to go from here 72Appendix A 7559. Ultra-Quick introduction to Python 7560. Ultra-Quick introduction to Probabilities 7661. Ultra-Quick introduction to Matrices and Vectors 76
"About this title" may belong to another edition of this title.
FREE shipping within U.S.A.
Destination, rates & speedsSeller: ThriftBooks-Dallas, Dallas, TX, U.S.A.
Paperback. Condition: Good. No Jacket. Pages can have notes/highlighting. Spine may show signs of wear. ~ ThriftBooks: Read More, Spend Less 0.3. Seller Inventory # G1517216710I3N00
Quantity: 1 available
Seller: ThriftBooks-Dallas, Dallas, TX, U.S.A.
Paperback. Condition: Fair. No Jacket. Readable copy. Pages may have considerable notes/highlighting. ~ ThriftBooks: Read More, Spend Less 0.3. Seller Inventory # G1517216710I5N00
Quantity: 1 available
Seller: SecondSale, Montgomery, IL, U.S.A.
Condition: Good. Item in good condition. Textbooks may not include supplemental items i.e. CDs, access codes etc. Seller Inventory # 00050085500
Quantity: 2 available
Seller: Seattle Goodwill, Seattle, WA, U.S.A.
paperback. Condition: Good. May have some shelf-wear due to normal use. Your purchase funds free job training and education in the greater Seattle area. Thank you for supporting Goodwill's nonprofit mission! Seller Inventory # 0KVOGF005K8S_ns
Quantity: 1 available
Seller: California Books, Miami, FL, U.S.A.
Condition: New. Print on Demand. Seller Inventory # I-9781517216719
Quantity: Over 20 available
Seller: AwesomeBooks, Wallingford, United Kingdom
Condition: Very Good. This book is in very good condition and will be shipped within 24 hours of ordering. The cover may have some limited signs of wear but the pages are clean, intact and the spine remains undamaged. This book has clearly been well maintained and looked after thus far. Money back guarantee if you are not satisfied. See all our books here, order more than 1 book and get discounted shipping. . Seller Inventory # 7719-9781517216719
Quantity: 1 available
Seller: Bahamut Media, Reading, United Kingdom
Condition: Very Good. Shipped within 24 hours from our UK warehouse. Clean, undamaged book with no damage to pages and minimal wear to the cover. Spine still tight, in very good condition. Remember if you are not happy, you are covered by our 100% money back guarantee. Seller Inventory # 6545-9781517216719
Quantity: 1 available
Seller: THE SAINT BOOKSTORE, Southport, United Kingdom
Paperback / softback. Condition: New. This item is printed on demand. New copy - Usually dispatched within 5-9 working days 152. Seller Inventory # C9781517216719
Quantity: Over 20 available
Seller: CitiRetail, Stevenage, United Kingdom
Paperback. Condition: new. Paperback. BigData and Machine Learning in Python and Spark Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. Seller Inventory # 9781517216719
Quantity: 1 available