BACKGROUND
A company contains different types of document repositories with valuable information (contracts, reports, invoices…).
Problem
All of them are distributed and not integrated under a common interface. Search process is mainly based on looking for the name or properties of the files.
Benefit
A common interface can allow the users to access all these repositories using one single access point. Inference properties are added to allow queries such as “what are the most expensive invoices”.
METHODOLOGY & results
Architecture: Development of a flexible architecture using a local search engine based on Lucene, which allows integrating different datasets, databases or textual repositories in a customised way.
Developing language: Java
ML techniques: Unsupervised learning algorithms, fuzzy logic and NLP techniques.
Results: Depending on the datasets, databases or textual repositories that are integrated into the system, it allows configuring different types of queries, selecting the best information sources for them.

