Pororoca

Period

2025

The Artificial Intelligence Institute at LNCC develops and maintains an integrated infrastructure for extracting, organizing, and making scientific knowledge available, focused on analyzing large volumes of articles across different fields. This infrastructure combines multimodal semantic OCR models, semantic search mechanisms, and Retrieval-Augmented Generation (RAG) pipelines, enabling researchers to perform complex queries and obtain evidence-based answers directly extracted from original sources. The system is modular, capable of operating across multiple domains and integrating new datasets or models as needed, and it was designed to run both in high-performance computing (HPC) environments and in optimized cloud deployments.

Within this ecosystem, Pororoca is one of the specialized modules, dedicated to meteorological science. It functions as a natural language question-and-answer tool, using summaries of scientific articles in the field as its knowledge base and employing the technique of Retrieval-Augmented Generation (RAG), which combines intelligent search and response generation to provide accurate and up-to-date information.

In the first stage (Retrieval), Pororoca searches databases for text passages relevant to the user’s query, applying relevance criteria and prioritizing recent studies without disregarding important classics. Next, in the Generation stage, a language model processes this data and produces a coherent and contextualized response, citing the original sources.

This approach allows Pororoca to overcome the limitations of conventional models, which rely solely on pre-trained knowledge, by delivering answers grounded in scientific evidence. Furthermore, the system accelerates research by reducing literature screening time, facilitating the retrieval of methods, figures, and tables, and ensuring transparency with direct citations of excerpts and article pages, including links to the original PDFs.

With a two-step semantic search (abstracts → passages) and a ranking system that prioritizes recent publications, Pororoca serves as a bridge between specialized databases and the generation of high-quality answers, optimizing the research cycle and decision-making process. For more technical and methodological details, visit: https://doi.org/10.21203/rs.3.rs-7055155/v1 .