SparkML basics

Слайд 4

RDD Basics

RDD Basics

Слайд 5

RDD Basics

RDD Basics

Слайд 6

RDD Basics

RDD Basics

Слайд 7

RDD Basics

RDD Basics

Слайд 8

DataFrames

DataFrames

Слайд 9

Datasets

Datasets

Слайд 10

SQL vs. DataFrame vs. Dataset

SQL vs. DataFrame vs. Dataset

Слайд 11

Spark ML Pipelines

Spark ML Pipelines

Слайд 12

Spark ML Pipelines

Transformer

Spark ML Pipelines Transformer

Слайд 13

Spark ML Pipelines

Transformer

Estimator

Spark ML Pipelines Transformer Estimator

Слайд 14

Spark ML Pipelines

Spark ML Pipelines

Слайд 15

Spark ML Pipelines

Spark ML Pipelines

Слайд 16

Spark ML Pipelines

Spark ML Pipelines

Слайд 17

Spark ML Core

Spark ML Core

Слайд 18

Field Metadata and Attributes

Field Metadata and Attributes

Слайд 19

Prediction Model

Prediction Model

Слайд 20

“My Spark ML Model”

“My Spark ML Model”

Слайд 21

Spark ML Features

ETL
SQLTransformer
SqlFilter, ColumnsExtractor
Numerization
OneHotEncoder
StringIndexer
MultinomialExtractor
Vectorization
VectorAssembler
FeatureHasher
AutoAssembler

Feature Normalization
MaxAbsScaler
MinMaxScaler
Normalizer
QuantileDiscretizer
StandardScaler
Missing values
Imputer
NullToDefaultReplacer
NaNToMeanReplacer

Spark ML Features ETL SQLTransformer SqlFilter, ColumnsExtractor Numerization OneHotEncoder StringIndexer MultinomialExtractor Vectorization

Слайд 22

Spark ML Features

Feature Engineering
DCT
ElementwiseProduct
Interaction
VectorIndexer
PolynomialExpansion
Feature Selection
ChiSqSelector
FoldedFeaturesSelector

Dimension reduction
PCA
MinHashLSHModel
BucketedRandomProjectionLSH
RandomProjectionsHasher

Spark ML Features Feature Engineering DCT ElementwiseProduct Interaction VectorIndexer PolynomialExpansion Feature Selection

Слайд 23

Spark ML Features

Texts extraction
Tokenizer
RegexTokenizer
Ngram
StopWordsRemover
NLP in Pravada-ML
LanguageDetectorTransformer
LanguageAwareAnalyzer
NGramExtractor
URLElimminator

Texts vecotization
CountVectorizer
HashingTF
IDF
Text embedding
Word2Vec
Clustering
LDA
KMeans/BisectingKMeans
GaussianMixture

Spark ML Features Texts extraction Tokenizer RegexTokenizer Ngram StopWordsRemover NLP in Pravada-ML

Слайд 24

Spark ML Features

Regression

Classification

Spark ML Features Regression Classification

Слайд 25

Spark ML Features

Recommendations
ALS
FPGrowth
Evaluation
BinaryClassificationEvaluator
ClusteringEvaluator
MulticlassClassificationEvaluator
RegressionEvaluator

Tuning
ParamGridBuilder
CrossValidator
More from Pravda-ML
CombinedModel
PartitionedRankingEvaluator
CRRSampler
XGBoost
StochasticHyperopt

Spark ML Features Recommendations ALS FPGrowth Evaluation BinaryClassificationEvaluator ClusteringEvaluator MulticlassClassificationEvaluator RegressionEvaluator Tuning