INTERPOL announces successful final field test of language-independent, voice recognition system for identifying unknown speakers
Using a database with real audio recordings, the UK’s Metropolitan Police Service and the Portuguese Polícia Judiciária, demonstrated how unknown speakers talking in different languages could be identified through social media or lawfully intercepted audios using a fusion of key markers such as gender, age, language and accent.
The four-year (May 2014 – April 2018) European Union-funded research project is run by an international consortium of 19 partners, comprising Law Enforcement Agencies or LEAs (INTERPOL and police forces from the UK, Italy, Portugal and Germany), SMEs, industrial hi-tech companies and academic institutes.
As a full project partner, INTERPOL focuses on ensuring that the speaker identification technology meets the operational needs and requirements of law enforcement agencies, while guaranteeing that the legal aspects of the technology are compatible with existing national legislation including INTERPOL’s Rules for the Processing of Data and safeguards for individual privacy.
The technology and its utility
SiiP is a probabilistic, language-independent, voice recognition system that uses a novel Speaker-Identification (SID) engine and Global Info Sharing Mechanism (GISM) to identify unknown speakers who are captured in lawfully intercepted calls, in recorded crime or terror arenas, in social-media and in any other type of speech sources.
The system’s speaker identification technology combines multiple speech analytic algorithms (Speaker-model-Identification, Gender-Identification, Age-Identification, Language-Identification and Accent-Identification) which are provided by different vendors. This fusion results in highly reliable and confident detection, keeping the False-Positives & False-Negatives to the minimum.
SiiP enables LEAs to overcome two main challenges they face today:
- The Evasion Challenge - The use of hidden, fake and arbitrary identities by terrorists and criminals at the telephony and Internet mediums in aim to avoid their lawful interception, identification and tracking by LEAs. These include amongst other, the use of arbitrary nick names in various Internet VOIP applications (e.g. Skype, Viber), use of face mask in Social-Media (e.g. YouTube) and Frequent altering of SIM cards in cell-phones.
- The second side problem - The difficulty in identifying unknown participants in a lawfully intercepted call of a known speaker.
Depending on adequate judicial warrant and in accordance with the legal and ethical frameworks, the system can be run on any speech source and channel (Internet, Social-Media, PSTN (public switched telephone network), Cellular and SATCOM) and provide LEAs with better intelligence and improved judicial admissible evidence.
It can use the speaker model as search criteria for social-media in aim to find more information about the speaker of interest. Each speaker identity can be associated with rich-metadata (Identifiers used by the Speaker, Personal details, Location-profiles, Social-connections and many more), taken from a variety of sources in the web, in Social-Media and in Telephony. Suspect voice and metadata from Internet and Telephony sources, including Social-Media (e.g. YouTube) can be added.
In accordance with INTERPOL regulations, the system establishes a secured global info sharing mechanism for speaker-models and their associated metadata between LEAs around the world, via the INTERPOL, thereby expediting the investigation process and achieving significant resource savings for the LEAs.