Содержание
- 2. data analysis and visualization; machine learning; cybersecurity-related data analytics My interests: Topic is important because: application
- 3. Terms Malware software that is specifically designed to disrupt, damage, or gain unauthorized access to a
- 4. Main Steps Dataset collection Building a machine learning model Data reduction 01 02 03
- 5. Dataset collection 01. With data collection, “the sooner the better”, is always the best answer. —Marissa
- 6. Problem Create a dataset with features that will help the system distinguish between good and bad
- 7. Solution Found: 3077 binary malicious files 1952 binary benign files collected from “VX Heavens Virus Collection”
- 8. Solution Extracted: 100 features from binary portable executable files (.exe, .dll, .sys, etc.) using “pefile” python
- 9. Dataset reduction 02. Redundancy is expensive but indispensable. —Jane Jacobs
- 10. Problem Select features that yield the most accurate results: apply data reduction algorithms obtain dataset with
- 11. Solution Applied: Feature importance technique based on Gini importance metric Principal component analysis (PCA) for input
- 12. Solution Obtained: 10 features with the highest scores; the higher, the more important the feature
- 13. Solution Obtained: reduced the dimensionality of the data from 8 to 2 Principal component 1 -
- 14. Building a machine learning model 03. What we want is a machine that can learn from
- 15. Problem Determine which file is malicious and which is benign: apply a machine learning algorithm split
- 16. Solution The data was split into: 5 equal folds Each fold was used for both training
- 17. Solution Applied: Decision Trees Classifier algorithm. Built Decision Tree. Classification rate (accuracy score): 0.9371
- 18. Libraries & frameworks used Pandas Numpy Pefile Scikit-learn Matplotlib Math
- 19. Resources Presentation template M. Zubair Shafiq et al. (2009) PE-Miner: Mining Structural Information to Detect Malicious
- 21. Скачать презентацию