Содержание
- 2. 我们会成功 We will succeed ! У нас все получится [U nas vse poluchitsya ] ! Без
- 3. . Lecture3. Data preproccessing and machine learning with Scikit-learn
- 4. Извлечение признаков и масшатбирование, будущая выборка, уменьшение размерности выборки
- 5. Training set and testing set Machine learning is about learning some properties of a data set
- 6. Reading a Dataset
- 8. Data Description : Attribute Information: 1. sepal length in cm 2. sepal width in cm 3.
- 9. A basic table is a two-dimensional grid of data, in which the rows represent individual elements
- 10. Target array In dataset we also work with a label or target array, which by convention
- 11. Basic Data Analysis : The dataset provided has 150 rows Dependent Variables : Sepal length.Sepal Width,Petal
- 12. The dataset is divided into Train and Test data with 80:20 split ratio where 80% data
- 13. Each training point belongs to one of N different classes. The goal is to construct a
- 14. What is scikit-learn? The scikit-learn library provides an implementation of a range of algorithms for Supervised
- 15. You can watch the Pandas and scikit-learn features documentation on this site. https://pandas.pydata.org/pandas-docs/stable/ https://scikit-learn.org/stable/documentation.html
- 16. Preprocessing Data: missing data Real world data is filled with missing values. You will often need
- 18. Method 1: Mean or Median A common method of imputation with numeric features is to replace
- 20. Imputation Method 2: Zero Depending on where your data are coming from, a missing value may
- 22. Imputation for Categorical Data For categorical features, using mean, median, or zero-imputation doesn’t make much sense.
- 24. Imputation Method 1: Most Common Class One approach to imputing categorical features is to replace missing
- 26. Imputation Method 2: “Unknown” Class Similar to how it’s sometimes most appropriate to impute a missing
- 28. Column-Specific Imputation Rules You can combine any of the above methods by imputing specific columns rather
- 29. Preprocessing Data If data set are strings We saw in our initial exploration that most of
- 30. The image below represents a dataframe that has one column named ‘color’ and three records ‘Red’,
- 31. #import the necessary module from sklearn import preprocessing # create the Labelencoder object le = preprocessing.LabelEncoder()
- 32. Training Set & Test Set A Machine Learning algorithm needs to be trained on a set
- 33. But before doing all this splitting, let’s first separate our features and target variables. #import the
- 34. Watch subtitled video https://www.coursera.org/lecture/machine-learning/what-is-machine-learning-Ujm7v
- 38. Скачать презентацию