Effectively extract and manipulate HTML content with the jsoup library
Overview
In Detail
As you might know, there are a lot of Java libraries that support parsing HTML content out there. Jsoup is yet another HTML parsing library, but it provides a lot of functionalities and boasts much more interesting features when compared to others. Give it a try, and you will see the difference!
Instant jsoup How-to provides simple and detailed instructions on how to use the Jsoup library to manipulate HTML content to suit your needs. You will learn the basic aspects of data crawling, as well as the various concepts of Jsoup so you can make the best use of the library to achieve your goals.
Instant jsoup How-to will help you learn step-by-step using real-world, practical problems. You will begin by learning several basic topics, such as getting input from a URL, a file, or a string, as well as making use of DOM navigation to search for data. You will then move on to some advanced topics like how to use the CSS selector and how to clean dirty HTML data. HTML data is not always safe, and because of that, you will learn how to sanitize the dirty documents to prevent further XSS attacks.
Instant jsoup How-to is a book for every Java developer who wants to learn HTML manipulation quickly and effectively. This book includes the sample source code for you to refer to with a detailed explanation of every feature of the library.
What you will learn from this book
Approach
Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. This book will take a how-to approach, focusing on recipes that demonstrate Jsoup.
Who this book is written for
If you are working in data scraping, data crawling, or within a similar area using Java, then this book is the one for you. This book acts as a fast-paced and simple guide to enhance your HTML data manipulating skills using one of the most well-known libraries, Jsoup.
"synopsis" may belong to another edition of this title.
Pete Houston is a B.S in Computer Science, having graduated from university in South Korea. He has been working in the IT industry for 10 years, and his work experience includes medical image researching to diagnose cancer symptoms using technologies such as C, C++, COM/DLL, ActiveX Control, and C#.NET 3.0. Pete has also designed and created an Android mobile platform. Currently, he deals with researching and implementing search algorithms for data mining, which includes C, Apache, Python, and Hadoop. Pete has also worked as Technical Leader for backend systems to provide information services. He has already worked with Java, Jsoup, PHP, SimpleXML and Yii\Slim Framework.
"About this title" may belong to another edition of this title.
US$ 2.64 shipping within U.S.A.
Destination, rates & speedsSeller: GreatBookPrices, Columbia, MD, U.S.A.
Condition: New. Seller Inventory # 23288820-n
Quantity: Over 20 available
Seller: Lucky's Textbooks, Dallas, TX, U.S.A.
Condition: New. Seller Inventory # ABLIING23Mar2912160158205
Quantity: Over 20 available
Seller: California Books, Miami, FL, U.S.A.
Condition: New. Seller Inventory # I-9781782167990
Quantity: Over 20 available
Seller: GreatBookPrices, Columbia, MD, U.S.A.
Condition: As New. Unread book in perfect condition. Seller Inventory # 23288820
Quantity: Over 20 available
Seller: PBShop.store US, Wood Dale, IL, U.S.A.
PAP. Condition: New. New Book. Shipped from UK. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000. Seller Inventory # L0-9781782167990
Quantity: Over 20 available
Seller: Books Puddle, New York, NY, U.S.A.
Condition: New. Seller Inventory # 26357013203
Quantity: 1 available
Seller: PBShop.store UK, Fairford, GLOS, United Kingdom
PAP. Condition: New. New Book. Delivered from our UK warehouse in 4 to 14 business days. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000. Seller Inventory # L0-9781782167990
Quantity: Over 20 available
Seller: Majestic Books, Hounslow, United Kingdom
Condition: New. Seller Inventory # 355477772
Quantity: 1 available
Seller: Ria Christie Collections, Uxbridge, United Kingdom
Condition: New. In. Seller Inventory # ria9781782167990_new
Quantity: Over 20 available
Seller: Chiron Media, Wallingford, United Kingdom
Paperback. Condition: New. Seller Inventory # 6666-IUK-9781782167990
Quantity: 10 available