Applications – AILab@TUL

Software APPLICATIONS

RECOM Service

with Tritius in LibriBot project

An upcomming service that will enable content-based recommendation of similar literary works directly within a library web catalog. Recommendations will be made based on the vector representation of the work’s content, which will be calculated using a proprietary language (vector) model and various metadata from the target literary works.

Another upcomming service that will use a specialized vectorization model for searching and retrieving literary works using the RAG (retrieval augmented generation) method. This functionality will significantly iincrease the user comfort of readers when searching in library catalogs, which in the current version relies mainly on the use of keywords.

RAGAI Service

with Tritius in LibriBot project

SW tools for gear development optimization

with LENAM

An upcomming software tool that will allow accurately emulate results in computationally demanding simulations. The tool will contain machine learning-based models that will be able to predict the behavior of machine components and parts under various conditions, enabling engineers to optimize designs more quickly and efficiently compared to time-consuming calculations and physical experiments.

The module enables efficient, accurate, and semantically rich searches in large corpora of judicial decisions. For this purpose, it uses a classic lexical index for fast filtering and precise queries on metadata (e.g., file numbers, dates) and a vector index that represents the semantic content of texts in a multidimensional space, enabling searches based on semantic similarity.

SW module for searching judicial decisions using natural language

SW for online speaker diarization in audiovisual data streams

This software enables real-time division of input audiovisual data streams into single speaker segments. For processing purely acoustic data, the system uses an extractor of x-vectors based on ResNET architecture, which is optimized for online operation with constant delay using caching. The extracted vectors are then clustered using a block online k-means algorithm. For processing visual data,the SyncNet model is employed in conjunction with vectors representing the speakers’ faces.

These modules utilize an End-to-End (E2E) Conformer Neural Network combining CTC and AED methods and employ an optimized decoder for real-time transcription with latency under 2 seconds. These modules are trained on thousands hours of speech data and use transformed-based models for automatic punctuation and capitalization restoration.

SW modules for ASR of various European languages

with NEWTON Technologies

MyVoice 2.0 + MyDictate

We have developed a voice control system for Microsoft Windows (MyVoice 2.0) and a dictation tool (MyDictate) ) for users with specific needs. These programmes are recognized as a compensatory aid and are distributed by SILOU HLASU, z.s.

Newton Dictate Engine is a modular real-time speech recognition engine, which uses a hybrid acoustic model combining Hidden Markov Models and deep neural networks. Its language model can be extended with domain-specific vocabularies. The decoder employs an optimized Viterbi algorithm for fast processing, while a post-processing performs text formatting. The engine powers NEWTON dictate software, currently available for Czech, Slovak, Polish, Croatian, and Serbian.

NEWTON Dictate Engine

older SOFTWARE aplications

2011 – now

Real-time transcription of audio streams in various languages

Modular system for real-time trancription of TV and radio audio streams.
Uses deep neural networks for tasks such as voice activity detection, speaker diarization and voice-to-text.

2011 – 2014

NAKI – Historical CRo Archive

A complex platform that can transcribe, index, and make the historical archives of Czech and Czechoslovak Radio searchable.
The archive covers 90 years of public broadcasting and contains hundreds of thousands of audio documents.

2009

MobilDictate – Very Large Vocabulary Voice Dictation for Mobile Devices

A standalone speech recognizer for Czech that was practically deployable on PDAs and smart mobile phones available in 2009.
The program worked with vocabularies having 250K+ words and made voice input faster than typing with a stylus.

2007 – now

ATT (Audio Transcription Toolkit)

System for automatic broadcast news collection, annotation, and indexing with full-text search.
ATT is used in the company Newton Media since 2007.

2005, 2020

MyDictate – program for discrete-speech dictation

A discrete speech dictation tool. It can be used as a standalone program or as an extension to MyVoice .
The programme is recognised as a compensatory aid and is distributed by SILOU HLASU, z.s.

2005, 2020

MyVoice – voice-controlled PC tool

The program for voice control of the PC used by tens of disabled users.
It allows them to control any application installed on their computers entirely by voice.
New version MyVoice2 enabling control of Windows 10 was relaunched in 2020.
The programme is recognised as a compensatory aid and is distributed by SILOU HLASU, z.s.

2007 – 2008

SmartRoom – Voice Controlled Center for Homes of Motor-Handicapped Persons

Home-control prototype mainly for motor-handicapped people.
The system consists of two modules: a program MyVoice and a unit for controlling external electric devices.

2004

ProtoATT

Prototype of a system for automatic transcription of Czech broadcast

2004

Voice dictation to a computer

The first discrete-speech dictation system for Czech.
Dictionary containins 400,000 most common words and word forms, which is almost 99% of the entire vocabulary of the Czech language.

2003

Dundis – Internet speech recognizer

An early distributed speech recognition system.
User’s computer is used to record speech and the server provides speech recognition.
Thus, the client is unloaded by recognition algorithms that consume much computing power and memory.

2003 – 2004

Chatter – The 3-D Artificial Talking Head

A fully parametric 3D model of a computerized talking head for the Czech language

2002

ConRec 0.1 – Czech Continuous Speech Recognition in Real Time

The first continuous speech recognition for Czech.
Vocabulary contained up to 20,000 most frequent words.
On a 2 GHz computer, the system displayed the recognized text with a less than 1 s latency with 80 % accuracy.

2001 – eARLIER

For Speechlab applications dating back before 2001, see the older Speechlab webpage.