Thanks. We give theoretical guarantees on the False Positive Probability which can be useful to decide the parameters. Some use cases of the project include detecting:
1. Intrusions
2. Fake Ratings
3. Financial Fraud
If anyone would be interested in trying to apply these techniques to our COVID behavior change & anti-misinformation effort ProjectDomino.org, we'd be happy to share data - this may be quite helpful! Just jump into the Slack (open invite) and we can start getting you situated.
How do you think this can help change the landscape of security, judging by the fast speeds, as low as .13s on DARPA, I imagine it will help block larger numbers of suspicious activities.
Gephi has a realtime stream importer for Twitter. Would it be possible for this tool to be a Gephi plugin that could be used in realtime on the same graph?
Definitely. It will need only very small changes to the code. I would love to add it as a plugin. Can you point to some resources that can help in incorporating MIDAS into Gephi.
This will propel important research into anomaly detection using dynamic graphs. Existing static graph methods have huge flaws; this would fix some of them
Nice suggestion. Will definitely try to refactor. Thanks!
In most of the cases, timestamps should be with the data itself (assuming its a dynamic graph). If timestamps are to be chosen, one can select in a way seeing how many edges usually come in one time tick (second/minute etc.)
Timestamps don't affect any parameters other than alpha (temporal decay factor). You may want to check out how to decay the contribution of the past edges in the anomalousness of the current edge. If there is lot of granularity in the timestamps, a smaller alpha should be chosen. Hope it helps.
I'm looking forward to M-Stream for multi-dimensional data - but I have one question for that. Is there some preferred approach for selecting features in multi-dimensional anomaly detection?
Because I wonder if given enough dimensions, everything would be anomalous. Kind of like p-hacking works (at p=0.05 one of twenty hypotheses is falsely accepted just by sheer luck).
Interesting question. With an increase in dimensions, we consider the correlation between the features in addition to considering them individually. The work is currently under review. Feel free to get in touch and I can update you once we release the MStream work.
We detect suddenly appearing bursts of activity which share many repeated nodes or edges, which we refer to as microclusters. E.g. denial of service (DoS) attacks in network traffic data and lockstep behavior.
Also, we detect scenarios where an individual edge may not be anomalous but along with other edges it acts as an anomalous community. For example, in the animation at https://github.com/bhatiasiddharth/MIDAS/ it may be possible that an individual edge is not anomalous but together the three malicious entities do a coordinated DoS attack.
Thanks, MIDAS can be used to detect intrusions, fake ratings, frauds. Basically finding anomalous and suspicious behavior in a dynamic (time-evolving) graph.
We have also extended MIDAS to detect group anomalies in higher-dimensional records e.g. event-log data or multi-attributed graphs. We will release it soon.
We assume (like any anomaly detection algorithm) that the majority is normal sample. In your context, the normal samples will be considered as outliers and therefore caught by the algorithm. One way to mitigate this is to either swap the labels. Another way is to sample a subset of the anomalies and then try.
Currently MIDAS is available in Rust, Python, Ruby and R at https://github.com/bhatiasiddharth/MIDAS. If someone is interested to convert MIDAS to other languages, please feel free to do so and let me know so that I can add a link in the repository.
> Up to 48% more accurate and 644 times faster than the state of the art approaches