Automatic fuzzy classification of millions of documents

BACKGROUND

Companies accumulate millions of textual documents of different nature: emails, presentations, reports, technical requirements, working instructions, etc.

Problem

Those documents are disorganized which makes the process of searching and finding the precise information time consuming.

Benefit

Grouping the files into different categories in an automatic manner, can save a lot of time for users and companies (knowledge management).

METHODOLOGY & results

Architecture: On-premise development using local indices based on Lucene.

Developing language: Java

ML techniques: Unsupervised learning algorithms, fuzzy logic and NLP techniques.

Results: Depending on the complexity of the texts and their associated vocabulary as well as the number of files, the system can group the documents with accuracy between 80% and 90%.

This project was carried out before AI Shepherds’ foundation by its team members.
View More Projects
Scroll to Top