Items related to A collection of Data Science Interview Questions Solved...

A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning (A Collection of Programming Interview Questions) - Softcover

  • 3.33 out of 5 stars
    15 ratings by Goodreads
 
9781517216719: A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning (A Collection of Programming Interview Questions)

Synopsis

BigData and Machine Learning in Python and Spark

"synopsis" may belong to another edition of this title.

From the Inside Flap

Table of Content

1. What are the most important machine learning techniques? 10
Solution 10
2. Why is it important to have a robust set of metrics for machine learning? 11
Solution 11
Code 12
3. Why are Features extraction and engineering so important in machine learning? 12
Solution 12
4. Can you provide an example of features extraction? 14
Solution 14
Code 14
5. What is a training set, a validation set, a test set and a gold set in supervised and unsupervised learning? 15
Solution 15
6. What is a Bias - Variance tradeoff? 16
Solution 16
7. What is a cross-validation and what is an overfitting? 17
Solution 17
Code 18
8. Why are vectors and norms used in machine learning? 18
Solution 18
Code 19
9. What are Numpy, Scipy and Spark essential datatypes? 19
Solution 19
Code 20
10. Can you provide an example for Map and Reduce in Spark? (Let's compute the Mean Square Error) 20
Solution 20
Code 21
11. Can you provide examples for other computations in Spark? 22
Solution 22
Code 25
12. How does Python interact with Spark 26
Solution 26
13. What is Spark support for Machine Learning? 26
Solution 26
14. How does Spark work in a parallel environment 27
Solution 27
Code 27
15. What is the mean, the variance, and the covariance? 27
Solution 27
Code 28
16. What are percentiles and quartiles? 28
Solution 28
Code 28
17. Can you transform an XML file into Python Pandas? 29
Solution 29
Code 29
18. Can you read HTML into Python Pandas? 30
Solution 30
Code 30
19. Can you read JSON into Python Pandas? 31
Solution 31
Code 31
20. Can you draw a function from Python? 31
Solution 31
Code 31
21. Can you represent a graph in Python? 32
Solution 32
Code 32
22. What is an Ipython notebook? 33
Solution 33
Code 33
23. What is a convenient tool for performing data statistics? 34
Solution 34
Code 34
24. How is it convenient to visualize data statistics 35
Solution 35
Code 35
25. How to compute covariance and correlation matrices with pandas 36
Solution 36
Code 36
26. Can you provide an example of connection to the Twitter API? 37
Solution 37
Code 37
27. Can you provide an example of connection to the LinkedIn API? 39
Solution 39
Code 39
28. Can you provide an example of connection to the Facebook API? 39
Solution 39
Code 40
29. What is a TFxIDF? 40
Solution 40
Code 40
30. What is "features hashing"? And why is it useful for BigData? 41
Solution 41
31. What is "continuous features binning"? 42
Solution 42
32. What is an LP normalization? 42
Solution 42
Code 42
33. What is a Chi Square Selection? 42
Solution 42
34. What is mutual information and how can it be used for features selection? 43
Solution 43
35. What is a loss function, what are linear models, and what do we mean by regularization parameters in machine learning? 43
Solution 43
36. What is an odd ratio? 46
37. What is a sigmoid function and what is a logistic function? 46
Code 47
38. What is a gradient descent? 47
Solution 47
39. What is a stochastic gradient descent? 49
Solution 49
Code 49
40. What is a Linear Least Square Regression? 50
Solution 50
Code 51
41. What are Lasso, Ridge, and ElasticNet regularizations? 52
Solution 52
42. What is a Logistic Regression? 52
Solution 52
Code 53
43. What is a stepwise regression? 54
Solution 54
44. How to include nonlinear information into linear models 54
Solution 54
45. What is a Naïve Bayes classifier? 55
Solution 55
46. What is a Bernoulli and a Multivariate Naïve Bayes? 57
Solution 57
Code 58
47. What is a Gaussian? 59
Solution 59
Code 59
48. What is a Standard Scaling? 60
Solution 60
Code 60
49. Why are statistical distributions important? 61
Solution 61
Code 63
50. Can you compare your data with some distribution? What is a qq-plot? 63
Solution 63
Code 63
51. What is a Gaussian Naïve Bayes? 64
Solution 64
52. What is another way to use Naïve Bayes with continuous data? 64
Solution 64
53. What is the Nearest Neighbor classification? 65
Solution 65
Code 66
54. What are Support Vector Machines (SVM)? 66
Solution 66
Code 68
55. What are SVM Kernel tricks? 68
Solution 68
56. What is K-Means Clustering? 70
Solution 70
Code 71
57. Can you provide an example for Text Classification with Spark? 71
Solution 71
Code 71
58. Where to go from here 72
Appendix A 75
59. Ultra-Quick introduction to Python 75
60. Ultra-Quick introduction to Probabilities 76
61. Ultra-Quick introduction to Matrices and Vectors 76

"About this title" may belong to another edition of this title.

Buy Used

Condition: Fair
Readable copy. Pages may have considerable...
View this item

FREE shipping within U.S.A.

Destination, rates & speeds

Search results for A collection of Data Science Interview Questions Solved...

Stock Image

Gulli, Antonio
ISBN 10: 1517216710 ISBN 13: 9781517216719
Used Paperback

Seller: ThriftBooks-Dallas, Dallas, TX, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback. Condition: Fair. No Jacket. Readable copy. Pages may have considerable notes/highlighting. ~ ThriftBooks: Read More, Spend Less 0.3. Seller Inventory # G1517216710I5N00

Contact seller

Buy Used

US$ 13.12
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Gulli, Antonio
ISBN 10: 1517216710 ISBN 13: 9781517216719
Used Paperback

Seller: ThriftBooks-Dallas, Dallas, TX, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback. Condition: Good. No Jacket. Pages can have notes/highlighting. Spine may show signs of wear. ~ ThriftBooks: Read More, Spend Less 0.3. Seller Inventory # G1517216710I3N00

Contact seller

Buy Used

US$ 13.12
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Gulli, Antonio
ISBN 10: 1517216710 ISBN 13: 9781517216719
Used Softcover

Seller: SecondSale, Montgomery, IL, U.S.A.

Seller rating 4 out of 5 stars 4-star rating, Learn more about seller ratings

Condition: Good. Item in good condition. Textbooks may not include supplemental items i.e. CDs, access codes etc. Seller Inventory # 00050085500

Contact seller

Buy Used

US$ 13.29
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: 3 available

Add to basket

Stock Image

Gulli, Antonio
ISBN 10: 1517216710 ISBN 13: 9781517216719
New Softcover
Print on Demand

Seller: California Books, Miami, FL, U.S.A.

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: New. Print on Demand. Seller Inventory # I-9781517216719

Contact seller

Buy New

US$ 20.00
Convert currency
Shipping: FREE
Within U.S.A.
Destination, rates & speeds

Quantity: Over 20 available

Add to basket

Stock Image

Antonio Gulli
ISBN 10: 1517216710 ISBN 13: 9781517216719
Used Softcover

Seller: AwesomeBooks, Wallingford, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: Very Good. This book is in very good condition and will be shipped within 24 hours of ordering. The cover may have some limited signs of wear but the pages are clean, intact and the spine remains undamaged. This book has clearly been well maintained and looked after thus far. Money back guarantee if you are not satisfied. See all our books here, order more than 1 book and get discounted shipping. . Seller Inventory # 7719-9781517216719

Contact seller

Buy Used

US$ 14.05
Convert currency
Shipping: US$ 6.82
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Antonio Gulli
ISBN 10: 1517216710 ISBN 13: 9781517216719
Used Softcover

Seller: Bahamut Media, Reading, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Condition: Very Good. Shipped within 24 hours from our UK warehouse. Clean, undamaged book with no damage to pages and minimal wear to the cover. Spine still tight, in very good condition. Remember if you are not happy, you are covered by our 100% money back guarantee. Seller Inventory # 6545-9781517216719

Contact seller

Buy Used

US$ 14.05
Convert currency
Shipping: US$ 9.54
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket

Stock Image

Antonio Gulli
ISBN 10: 1517216710 ISBN 13: 9781517216719
New Paperback / softback
Print on Demand

Seller: THE SAINT BOOKSTORE, Southport, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback / softback. Condition: New. This item is printed on demand. New copy - Usually dispatched within 5-9 working days 152. Seller Inventory # C9781517216719

Contact seller

Buy New

US$ 23.88
Convert currency
Shipping: US$ 11.30
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: Over 20 available

Add to basket

Stock Image

Antonio Gulli
ISBN 10: 1517216710 ISBN 13: 9781517216719
New Paperback

Seller: CitiRetail, Stevenage, United Kingdom

Seller rating 5 out of 5 stars 5-star rating, Learn more about seller ratings

Paperback. Condition: new. Paperback. BigData and Machine Learning in Python and Spark Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. Seller Inventory # 9781517216719

Contact seller

Buy New

US$ 28.84
Convert currency
Shipping: US$ 50.57
From United Kingdom to U.S.A.
Destination, rates & speeds

Quantity: 1 available

Add to basket