By Uri Laserson, Josh Wills
In this functional publication, 4 Cloudera info scientists current a collection of self-contained styles for appearing large-scale facts research with Spark. The authors convey Spark, statistical equipment, and real-world info units jointly to coach you ways to process analytics difficulties through example.
You’ll begin with an creation to Spark and its environment, after which dive into styles that follow universal techniques—classification, collaborative filtering, and anomaly detection between others—to fields akin to genomics, defense, and finance. when you have an entry-level knowing of desktop studying and records, and also you application in Java, Python, or Scala, you’ll locate those styles important for engaged on your individual info applications.
- Recommending song and the Audioscrobbler information set
- Predicting woodland disguise with choice trees
- Anomaly detection in community site visitors with K-means clustering
- Understanding Wikipedia with Latent Semantic Analysis
- Analyzing co-occurrence networks with GraphX
- Geospatial and temporal info research at the manhattan urban Taxi journeys data
- Estimating monetary threat via Monte Carlo simulation
- Analyzing genomics facts and the BDG project
- Analyzing neuroimaging information with PySpark and Thunder