Automatic fuzzy classification of millions of documents


Companies accumulate millions of textual documents of different nature: emails, presentations, reports, technical requirements, working instructions, etc.


Those documents are disorganized which makes the process of searching and finding the precise information time consuming.


Grouping the files into different categories in an automatic manner, can save a lot of time for users and companies (knowledge management).


Architecture: On-premise development using local indices based on Lucene.

Developing language: Java

ML techniques: Unsupervised learning algorithms, fuzzy logic and NLP techniques.

Results: Depending on the complexity of the texts and their associated vocabulary as well as the number of files, the system can group the documents with accuracy between 80% and 90%.

This project was carried out before AI Shepherds’ foundation by its team members.
Nach oben scrollen
Scroll to Top