Data Algorithms with Spark Recipes and Design Patterns for Scaling Up Using Pyspark

Author: Hagendorf, Col
Availability: In stock
Regular Price AED 350.00 Special Price AED 332.50
-
+
Cash on Delivery in UAE
Dispatches in 3 to 5 Working Days.

BISAC Categories:
Data Science | Machine Learning |

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark.

In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script.

With this book, you will:

  • Learn how to select Spark transformations for optimized solutions
  • Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions()
  • Understand data partitioning for optimized queries
  • Design machine learning algorithms including Naive Bayes, linear regression, and logistic regression
  • Build and apply a model using PySpark design patterns
  • Apply motif-finding algorithms to graph data
  • Analyze graph data by using the GraphFrames API
  • Apply PySpark algorithms to clinical and genomics data (such as DNA-Seq)

Publisher Name OReilly Media
Author Name Hagendorf, Col
Format Audio
Bisac Subject Major COM
Language NG
Isbn 10 1492082384
Isbn 13 9781492082385
Target Age Group min:NA, max:NA
Dimensions 00.00" H x 00.00" L x 00.00" W
Page Count 500

Mahmoud Parsian, Ph.D. in Computer Science, is a practicing software professional with 30 years of experience as a developer, designer, architect, and author. For the past 15 years, he has been involved in Java server-side, databases, MapReduce, Spark, PySpark, and distributed computing. Dr. Parsian currently leads Illumina's Big Data team, which is focused on large-scale genome analytics and distributed computing by using Spark and PySpark. He leads and develops scalable regression algorithms; DNA sequencing pipelines using Java, MapReduce, PySpark, Spark, and open source tools. He is the author of the following books: Data Algorithms (O'Reilly, 2015), PySpark Algorithms (Amazon.com, 2019), JDBC Recipes (Apress, 2005), JDBC Metadata Recipes (Apress, 2006). Also, Dr. Parsian is an Adjunct Professor at Santa Clara University, teaching Big Data Modeling and Analytics and Machine Learning to MSIS program utilizing Spark, PySpark, Python, and scikit-learn.

Write Your Own Review
You're reviewing:Data Algorithms with Spark Recipes and Design Patterns for Scaling Up Using Pyspark

Recommended Products

Booksvenue
Booksvenue.com is the Largest Bookstore in Middle East with more than 15 Million Books Online. Choose from a wide variety of Books from Fiction, Children, History, Games, Music, Travel, Cooking, Medical, Education and many more. All Books are sourced from International Publishers and we ensure to deliver at your door step. We currently deliver Worldwide and provide Free Delivery in UAE if the value is more than AED 100. Search, Click and Buy your favorite Books online.

  • Free Shipping Above AED 100 in UAE
  • Online Support (9AM - 6PM Monday - Saturday) +971 50 947 1943
  • Worldwide Delivery Over 15 Million Books
Contact Us

Address:HDS Tower, Jumeirah Lake Towers,

Dubai

United Arab Emirates.

Mail to: contact@booksvenue.com

Phone:  +971 50 947 1943

Whatsapp: +971 50 947 1943