Слайд 13Spark ML Pipelines
Transformer
Estimator
![Spark ML Pipelines Transformer Estimator](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/1090462/slide-12.jpg)
Слайд 21Spark ML Features
ETL
SQLTransformer
SqlFilter, ColumnsExtractor
Numerization
OneHotEncoder
StringIndexer
MultinomialExtractor
Vectorization
VectorAssembler
FeatureHasher
AutoAssembler
Feature Normalization
MaxAbsScaler
MinMaxScaler
Normalizer
QuantileDiscretizer
StandardScaler
Missing values
Imputer
NullToDefaultReplacer
NaNToMeanReplacer
![Spark ML Features ETL SQLTransformer SqlFilter, ColumnsExtractor Numerization OneHotEncoder StringIndexer MultinomialExtractor Vectorization](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/1090462/slide-20.jpg)
Слайд 22Spark ML Features
Feature Engineering
DCT
ElementwiseProduct
Interaction
VectorIndexer
PolynomialExpansion
Feature Selection
ChiSqSelector
FoldedFeaturesSelector
Dimension reduction
PCA
MinHashLSHModel
BucketedRandomProjectionLSH
RandomProjectionsHasher
![Spark ML Features Feature Engineering DCT ElementwiseProduct Interaction VectorIndexer PolynomialExpansion Feature Selection](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/1090462/slide-21.jpg)
Слайд 23Spark ML Features
Texts extraction
Tokenizer
RegexTokenizer
Ngram
StopWordsRemover
NLP in Pravada-ML
LanguageDetectorTransformer
LanguageAwareAnalyzer
NGramExtractor
URLElimminator
Texts vecotization
CountVectorizer
HashingTF
IDF
Text embedding
Word2Vec
Clustering
LDA
KMeans/BisectingKMeans
GaussianMixture
![Spark ML Features Texts extraction Tokenizer RegexTokenizer Ngram StopWordsRemover NLP in Pravada-ML](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/1090462/slide-22.jpg)
Слайд 24Spark ML Features
Regression
Classification
![Spark ML Features Regression Classification](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/1090462/slide-23.jpg)
Слайд 25Spark ML Features
Recommendations
ALS
FPGrowth
Evaluation
BinaryClassificationEvaluator
ClusteringEvaluator
MulticlassClassificationEvaluator
RegressionEvaluator
Tuning
ParamGridBuilder
CrossValidator
More from Pravda-ML
CombinedModel
PartitionedRankingEvaluator
CRRSampler
XGBoost
StochasticHyperopt
![Spark ML Features Recommendations ALS FPGrowth Evaluation BinaryClassificationEvaluator ClusteringEvaluator MulticlassClassificationEvaluator RegressionEvaluator Tuning](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/1090462/slide-24.jpg)