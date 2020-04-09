Fake news has become an increasingly prevalent problem, and in recent years a series of technological advances based on artificial intelligence have unfortunately managed to multiply the seriousness of this threat.

They are technologies that have a lot of potential for good, but are being exploited for evil. Fake images, sound and videos, the famous ‘deepfakes’, are beginning to flood social networks, and the most serious problem we have now is that are more difficult to detect than you think, while the techniques they use do not stop moving forward.

When it is difficult to detect more than 80% of the manipulated videos

At Genbeta we speak with Andres Torrubia, co-founder and CEO of Fixr.es, known on Twitter as @antor, where he constantly posts resources on a topic he is passionate about: artificial intelligence, machine learning and deep learning.

Andres, who has reached the most important AI competitions worldwide with leading teams from China and the United States, recently commented from his account that a few days ago one of the biggest world competitions to detect deepfakes: Kaggle, and the results leave us with worrying thoughts.

A few days ago it ended in @kaggle one of the competitions with the highest prizes $ 1 million in total, the problem: the detection of DEEP FAKES (ultra-false in Spanish). I tell you my reflections 👇 pic.twitter.com/kh7JrsUl8Y – Andres Torrubia (@antor) April 7, 2020

In his thread Andres explains that the best scores of the experts who participated in this competition to detect deepfakes have a hit rate of around 80%That may sound high, but we are talking about fake videos that are practically impossible to distinguish from a real one for any average person.

This also collides with detection rates of deepfakes in most published academic articles, which are around 99%. The reason for this: the data set used.

“In machine learning and deep learning there is one key word that is generalizing. Generalizing is that the system you have trained with the data set you use to train (training set) then has to work just as well with real data ( in this case deepfakes that you find out there). In many academic articles the data set they use to test deepfake detection systems is very similar to the one they use to train, and that’s why you see very high hit rates; but that does not happen later in reality; among other things because the systems to generate deepfakes evolve.

What are deepfake detection systems for today?

For now the vast majority of online deepfakes are porn but those of an electoral nature are increasingly concernedThe potential for viral manipulation in the political arena is increasingly alarming. You just have to see the progress through parodies like ‘Equipo E’, a viral of the Spanish political leaders that shows that these videos are already unstoppable.

The big problem here is that citizens are going to depend on detection algorithms created by these experts that do not allow up to 20% of deepfakes to sneak in, the damage that this can cause to societies is enormous. And that 20% is not going to stop there, because as Andres explains, systems continue to evolve.

The challenge is to establish whether the systems that detect deepfakes today will serve to detect deepfakes that are made with techniques that appear in the next two or three months.

The authors of the academic articles choose the data sets on which they train and also on which they validate the results; in many cases the sets on which they validate do not represent the worst scenarios in reality. Although it may be a bit controversial to say, in research it is considered a good result to exceed the benefits reported by another research group so there is no great incentive for you as a researcher to put on a much more difficult set of validation than what they have been put other researchers.

How do you work to combat this

This is where it comes in the importance of competitions like KaggleThe organizers of the deepfake challenge there are companies like Facebook, Amazon, Microsoft and more, including a committee that includes researchers from the university community and who clearly know the current situation.

Torrubia explains that the use of competitions is very interesting to catalyze innovations because results are rewarded, not effort. Those detection rates of 80% are given because participants do not have access to the deepfakes with which they will test their algorithm; they can’t even see them.

“This is critical because it closely resembles reality: in practice your algorithm will work well with any video, not only with those that you yourself have decided a priori; and the best way to verify this is to do it as they have done “

There is also the monetary component: “In total there are $ 1 million dollars in prizes. In addition to the reputation it would give you to be in a good position to win the award, the awards are substantial enough to attract both experts on the problem itself and other participants from all walks of life. “

Deepfake detection is an unresolved issue, and one that also becomes a cat-and-mouse game where counterfeiters take advantage. The more the techniques to create the videos advance, the more the algorithms capable of detecting them must advance. And we can no longer believe in everything we see or hear.

Cover image: Deeptrace