Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that's so clouded in hype? This insightful book, based on Columbia University's Introduction to Data Science class, tells you what you need to know.
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you're familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.
Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O'Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.
About the Authors
Rachel Schutt is the Senior Vice President for Data Science at News Corp. She earned a PhD in Statistics from Columbia University, and was a statistician at Google Research for several years. She is an adjunct professor in Columbia's Department of Statistics and a founding member of the Education Committee for the Institute for Data Sciences and Engineering at Columbia. She holds several pending patents based on her work at Google, where she helped build user-facing products by prototyping algorithms and building models to understand user behavior.
Cathy O'Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector.
"synopsis" may belong to another edition of this title.
Dasypus novemcinctus
What's the animal featured on the cover?
The animal on the cover of Doing Data Science is a nine-banded armadillo (Dasypus novemcinctus), a mammal widespread throughout North, Central, and South America. From Latin, novemcinctus literally translates to “nine-banded” (after the telescoping rings of armor around the midsection), though the animal can actually have between 7 to 11 bands. The three-banded armadillo native to South America is the only armadillo that can roll into a ball for protection; other species have too many plates. The armadillo’s skin is perhaps its most notable feature. Brownish-gray and leathery, it is composed of scaly plates called scutes that cover everything but its underside.
The animals also have powerful digging claws, and are known to create several burrows within their territory, which they mark with scent glands. Nine-banded armadillos typically weigh between 5.5 to 14 pounds, and are around the size of a large domestic cat. Its diet is largely made up of insects, though it will also eat fruit, small reptiles, and eggs. Females almost always have a litter of four—quadruplets of the same gender, because the zygote splits into four embryos after implantation. Young armadillos have soft skin when they are born, but it hardens as they get older. They are able to walk within a few hours of birth. Nine-banded armadillos are capable of jumping three to four feet in the air if startled. Though this reaction can scare off natural predators, it is usually fatal for the armadillo if an approaching car is what has frightened it, as it will collide with the underside of the vehicle. Another unfortunate connection between humans and nine-banded armadillos is that they are the only carriers of leprosy—it is not unheard of for humans to become infected when they eat or handle armadillos. The cover image is from Shaw’s Zoology, and was reinterpreted in color by Karen Montgomery.
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York start-up scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street.
Rachel Schutt is the Senior Vice President for Data Science at News Corp. She earned a PhD in Statistics from Columbia University, and was a statistician at Google Research for several years. She is an adjunct professor in Columbia’s Department of Statistics and a founding member of the Education Committee for the Institute for Data Sciences and Engineering at Columbia. She holds several pending patents based on her work at Google, where she helped build user-facing products by prototyping algorithms and building models to understand user behavior. She has a master's degree in mathematics from NYU, and a master's degree in Engineering-Economic Systems and Operations Research from Stanford University. Her undergraduate degree is in Honors Mathematics from the University of Michigan.
"About this title" may belong to another edition of this title.