Слайд 13Spark ML Pipelines
Transformer
Estimator
Слайд 21Spark ML Features
ETL
SQLTransformer
SqlFilter, ColumnsExtractor
Numerization
OneHotEncoder
StringIndexer
MultinomialExtractor
Vectorization
VectorAssembler
FeatureHasher
AutoAssembler
Feature Normalization
MaxAbsScaler
MinMaxScaler
Normalizer
QuantileDiscretizer
StandardScaler
Missing values
Imputer
NullToDefaultReplacer
NaNToMeanReplacer
Слайд 22Spark ML Features
Feature Engineering
DCT
ElementwiseProduct
Interaction
VectorIndexer
PolynomialExpansion
Feature Selection
ChiSqSelector
FoldedFeaturesSelector
Dimension reduction
PCA
MinHashLSHModel
BucketedRandomProjectionLSH
RandomProjectionsHasher
Слайд 23Spark ML Features
Texts extraction
Tokenizer
RegexTokenizer
Ngram
StopWordsRemover
NLP in Pravada-ML
LanguageDetectorTransformer
LanguageAwareAnalyzer
NGramExtractor
URLElimminator
Texts vecotization
CountVectorizer
HashingTF
IDF
Text embedding
Word2Vec
Clustering
LDA
KMeans/BisectingKMeans
GaussianMixture
Слайд 24Spark ML Features
Regression
Classification
Слайд 25Spark ML Features
Recommendations
ALS
FPGrowth
Evaluation
BinaryClassificationEvaluator
ClusteringEvaluator
MulticlassClassificationEvaluator
RegressionEvaluator
Tuning
ParamGridBuilder
CrossValidator
More from Pravda-ML
CombinedModel
PartitionedRankingEvaluator
CRRSampler
XGBoost
StochasticHyperopt