Data is the new oil but it comes as crude, just like oil. To do anything meaningful - modeling, visualization, machine learning, for predictive analysis - you first need to wrestle and wrangle with data. This book teaches the essential basics of data wrangling using Python.
Key Features
- Focuses on essential basics of wrangling to get you up and running with analysis in no time
- Teaches the tricks and know-how of "how to solve data wrangling problems"
- Added bonus topics - random data generation, data integrity checks
Book Description
To practice high-quality science with data, first you need to make sure it is properly sourced, cleaned, formatted, and pre-processed. This book teaches you the most essential basics of this invaluable component of the data science pipeline - data wrangling.
What you will learn
- Able to manipulate complex and simple data structure using Python and it's built-in functions
- Use the fundamental and advanced level of Pandas DataFrames and numpy.array
- Manipulate them at run time
- Extract and format data from various formats (textual) - normal text file, SQL, CSV, Excel, JSON, and XML
- Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib
- Perform advanced string search and manipulation using Python and RegEX
- Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas
- Basic descriptive statistics and plotting techniques in Python for quick examination of data
- Practice data wrangling and modeling using the random data generation techniques
Who This Book Is For
Software professionals, web developers, database engineers, and business analysts who want to movetowards a career of full-fledged data scientist/analytics expert or whoever wants to use data analytics/machine learning to enrich their current personal or professional projects.Prior experience with Python is not an absolute requirement, however the knowledge of at least oneobject-oriented programming language (e.g. C/C++/Java/JavaScript), and high school level math is highlypreferred. It is a bonus if you have rudimentary idea about relational database and SQL.Even seasoned Python app/web developers can benefit from this book as it focuses on data engineering aspects