New Text Mining Software Gives Quick Overview of Legal Texts

(c) Inaki del Olmo on Unsplash <a style="background-color:black;color:white;text-decoration:none;padding:4px 6px;font-family:-apple-system, BlinkMacSystemFont, "San Francisco", "Helvetica Neue", Helvetica, Ubuntu, Roboto, Noto, "Segoe UI", Arial, sans-serif;font-size:12px;font-weight:bold;line-height:1.2;display:inline-block;border-radius:3px" href="https://unsplash.com/@inakihxz?utm_medium=referral&utm_campaign=photographer-credit&utm_content=creditBadge" target="_blank" rel="noopener noreferrer" title="Download free do whatever you want high-resolution photos from Iñaki del Olmo"><span style="display:inline-block;padding:2px 3px"><svg xmlns="http://www.w3.org/2000/svg" style="height:12px;width:auto;position:relative;vertical-align:middle;top:-2px;fill:white" viewBox="0 0 32 32"><title>unsplash-logo</title><path d="M10 9V0h12v9H10zm12 5h10v18H0V14h10v9h12v-9z"></path></svg></span><span style="display:inline-block;padding:2px 3px">Iñaki del Olmo</span></a>

By Hildegard Suntinger

Computer scientists at the University of Vienna have developed a new text mining software that gives users a quick overview of legal texts. In particular, this is meant to support companies in the implementation of new regulatory requirements.

The General Data Protection Regulation (DSGVO), introduced in 2018, showed how demanding the handling of legal regulations can be. Everyone was affected – clubs, companies and individuals – and they all had the same extensive bundle of information, from which to arduously filter the information needed.

Simplifying Work

In companies, this situation is recurring. Checking legal documents costs time and resources, which are not available. The computer scientist Stefanie Rinderle-Ma has been working on this topic for years. She conducts research at the Faculty of Computer Science at the interface between science and business practice. The focus is on supporting and simplifying work.

“Especially at the beginning, when a new regulation comes, it is often unclear, what exactly has to happen. Everyone reads and interprets (…)”, says Rinderle-Ma.

New Approach

In her research group Research Group Workflow Systems and Technology, she developed and implemented a new Text Mining-method, together with the young researcher Karolin Winter. This allows users to automatically extract relevant information from the texts. “Users want to quickly get an overview or – if regulations are changing – quickly compare various text documents”, said the scientist.

New to this procedure is the preparation of the document. The original structure of the input document is dissipated, which allows the comparison of rules and sections in the text. This was not possible before. Additionally, Text Mining can for the first time be applied to individual documents with complex structures.

Reducing the Amount of Text

The program filters the text like a search engine for topics or addressed groups of people and organizations. For instance, for a company, information addressing the nation state, is irrelevant. The information directed at companies is however scattered throughout the entire document and thus difficult to gather. The Text Mining method enables structuring, which hides the irrelevant content and thus significantly reduces the reading effort.

Whoever may want to read, in which context a rule is described, doesn’t need to search the entire document. The program marks the relevant sentences and links these to the corresponding text passages within the document.

If a user wants to find out about a specific topic, sections of the text can be grouped together, and can this way give an overview quickly. Even text passages from several different texts can be edited simultaneously in this way.

Comparing Text Passages

Additionally, the new method also offers the possibility of comparison: there are often similar passages in documents. These are marked and summarized by the program. Users can review and compare these one after another. In this process, conflicts are also shown. This enables rapid recognition of conflicting rules and changes from an earlier version of the document to the next.

After successfully completing the project, the computer scientists are now researching further applications in business and science. The method is to be expanded continuously, in order to also help other areas cope with the daily flood of texts. One example would be the paperwork accumulating in the healthcare sector. According to Rinderle-Ma, a reduction of about sixty percent in the expenditure of time can be seen – at the same time as an increasing quality, because all obligations are present. She gives a similar estimate of the reading effort for regulatory documents, which is significantly reduced if the text can be thematically structured.

The project is running under the name CRISP and was funded by the Vienna Science, Research and Technology Fund (WWTF). It will run until the end of the year.