Hello kernelmode,
We are developing a new malware analyser service for malware researchers https://malwareanalyser.io.
Its' main purpose is to tag anomalies in (86\64) PE files and show extended reports.
Only static analysis of PE files is available for now.
We use both heuristic rules and machine learning to classify and detect whether file is malicious or clean.
For example, we test this sample viewtopic.php?f=21&t=5574
Result: https://malwareanalyser.io/report/39ab1 ... a9112d9cd0
We kindly ask community to help us with service testing on different files and suggest features to improve.
Heuristic core is written in C++ from scratch (more then 10k lines of code).
Prediction core is trained on Random Forests ensemble with more then 70 major features to classify if file malicious or not.
To train it we used dataset with about 1k malware samples and ~1k clean samples (from Program files, Windows and etc).
Prediction rate is about 97% on training set.
Also we have full db of virusshare samples (100k++) but we need almost same clean samples to build better dataset.
It will be great if someone tells us how to get 100k clean files of real (PE) program files :).
Some parts of engine will be open sourced on my github ( https://github.com/progressionnetwork).
We are developing a new malware analyser service for malware researchers https://malwareanalyser.io.
Its' main purpose is to tag anomalies in (86\64) PE files and show extended reports.
Only static analysis of PE files is available for now.
We use both heuristic rules and machine learning to classify and detect whether file is malicious or clean.
For example, we test this sample viewtopic.php?f=21&t=5574
Result: https://malwareanalyser.io/report/39ab1 ... a9112d9cd0
We kindly ask community to help us with service testing on different files and suggest features to improve.
Heuristic core is written in C++ from scratch (more then 10k lines of code).
Prediction core is trained on Random Forests ensemble with more then 70 major features to classify if file malicious or not.
To train it we used dataset with about 1k malware samples and ~1k clean samples (from Program files, Windows and etc).
Prediction rate is about 97% on training set.
Also we have full db of virusshare samples (100k++) but we need almost same clean samples to build better dataset.
It will be great if someone tells us how to get 100k clean files of real (PE) program files :).
Some parts of engine will be open sourced on my github ( https://github.com/progressionnetwork).