Achievements
- Developed and deployed a pipeline for financial data extraction from clients’ balance sheets, utilizing Optical Character Recognition (OCR) and Large Language Models (LLM) for accurate data processing.
- Implemented the solution on Amazon Web Services (AWS) and Microsoft Azure, ensuring scalability and cloud integration.
- Automated the extraction, storage, and querying processes to optimize time, reduce costs, and minimize human errors in the analysis of financial balances.
- Integrated the solution into the infrastructure of one of Brazil’s leading insurance companies, significantly improving operational efficiency.
Context
This project focused on extracting financial information from balance sheets for a major insurance company in Brazil. By leveraging OCR and LLM, the system was designed to process unstructured data from documents, extract relevant financial details, and provide accurate insights.
The key components of the solution include:
- Data Extraction: Uses OCR to extract text from scanned balance sheets.
- Data Processing: Processes extracted data with LLMs for context-based interpretation and organization.
- Cloud Infrastructure: Deployed on AWS and Microsoft Azure for reliable and scalable infrastructure.
- Cost and Time Reduction: Reduced analysis costs, time, and human errors by automating the data extraction and analysis processes.
Technologies Used
- AWS & Microsoft Azure: For cloud hosting and integration.
- OCR: To extract textual data from scanned documents.
- LLM: For processing and interpreting extracted financial data.
- Docker: To containerize the solution for deployment.
- Flask: For developing microservices that handle data extraction and querying.