Sophos has made four open artificial intelligence cybersecurity technologies freely available to researchers, fellow vendors, and security service providers.
The move is part of a larger effort to foster collaboration in the fight against cybercrime by sharing tools and techniques that others can use to produce innovations of their own, according to Sophos CTO Joe Levy.
“We think that it sets a good example for the rest of the industry, and we encourage others to do similar sorts of things,” he says. “If we all act this way, it’s going to have the greatest possible benefit for anyone who’s looking to apply this sort of science to cybersecurity.”
It will also, he continues, give cybersecurity research a greater claim to calling itself a science in the first place by enabling bedrock scientific practices like peer review and verification of evidence.
“It’s a tendency of our industry to be a little protectionist, I’m afraid, when it comes to the threat intelligence that we’re producing or the AI that we’re building,” Levy says, noting that the result is claims about the effectiveness of security techniques that can’t be tested or confirmed.
Publishing internally developed resources, he continues, doesn’t preclude Sophos or other vendors from utilizing them in legally protected ways that confer competitive advantage.
“We differentiate ourselves through the products and the services that we deliver, which can implement these innovations,” Levy says. It’s those products and services, he maintains, rather than the fundamental tools and insights they’re based on, that security vendors should handle as intellectual property.
One of the four resources made available today, called SOREL-20M and developed in partnerships with threat intelligence vendor ReversingLabs, is a collection of 20 million Windows Portable Executable files and 10 million disarmed malware samples that researchers can use to train machine learning-based malware detection models. According to Sophos, it’s the first production scale malware research data set, complete with associated metadata, available to the general public.
“It solves the problem of where do researchers get access to a well curated and well labeled data set that they can use to train models,” Levy says.
An AI-powered impersonation protection method also shared by Sophos today uses transformer technology of the kind recently created by the non-profit OpenAI research lab to combat business email compromise exploits, in which attackers adopt assumed identities to trick people into transferring funds or handing over protected information. BEC scams cost businesses more than $1.7 billion last year, according to the FBI’s 2019 Internet Crime Report.
“This particular class of email attack has been historically quite challenging to identify,” Levy says. “We’ve leveraged this fairly recent advancement within AI based on transformer technologies that allows us to detect this class of attack in a more effective way than any previous attempts in the industry.”
A “digital epidemiology” methodology published today provides a statistical model for determining the likelihood that a given class of malware or malicious behavior will appear within a given population of endpoints. Analysts can use those figures as a benchmark for assessing the effectiveness of their detection methodologies.