Created July 30, 2024

Oracle ERP SQL Generator for BI Reports

Instantly generate accurate SQL queries from Oracle ERP data for BI reports, saving time and reducing errors. Quick and efficient.

Things used in this project

Hardware components

AMD Radeon™ Pro W7900 GPU

AMD Ryzen 7 7700

ASRock Pro RS WiFi

32GB Crucial Pro DDR5-5600

Samsung 860 EVO 500GB

be quiet! Pure Rock Slim 2 Tower

Gigabyte UD750GM 750W

Software apps and online services

Microsoft Visual Studio 2017

Oracle Visual Builder

Story

In the world of Business Intelligence (BI), generating accurate and efficient SQL queries is a crucial step in obtaining insightful data reports. Automating this process can significantly streamline workflows and improve productivity. This technical story explores the development of a system that automates SQL generation using Oracle documentation and a Retrieval-Augmented Generation (RAG) model.

Step 1: Data Extraction from Oracle Documentation

The first step in our pipeline involves extracting data from Oracle's comprehensive documentation, which lists all the details of Oracle ERP tables. This is where our Python script, extractData.py, comes into play. By providing a URL to the Oracle documentation, the script automates the extraction of metadata, including table structures and summaries.

The Process:

Initialization: The user provides the URL of the Oracle documentation page.
Extraction: extractData.py scrapes the page for relevant data on Oracle ERP tables. This includes details such as column names, data types, constraints, and a brief description of each table's purpose.
Output: The script organizes the extracted information into a structured format. Each table's details are stored in a separate folder named after the table. Inside each folder, a CSV file contains the table's schema, while another file provides a summary of the table's purpose and key characteristics.

This structured data serves as the foundation for the next steps, providing a rich dataset for model training.

Step 2: Building the Dataset for SQL Generation

With the extracted data organized, we create a custom dataset to train our SQL generation model. The key to this process is using Retrieval-Augmented Generation (RAG), an advanced technique that enhances the model's ability to generate accurate SQL queries by incorporating external knowledge.

The Process:

Data Preparation: The extracted table structures and summaries are processed to create input-output pairs for training. For example, a table's schema might be used as input, while the desired SQL query output could involve selecting specific columns or filtering based on certain criteria.
RAG Implementation: The RAG model leverages both the dataset and additional retrieval mechanisms. It retrieves relevant documents or snippets from the dataset as context and uses this information to generate SQL queries. This dual approach helps the model generate more accurate and contextually appropriate SQL.

Step 3: Training the Model

The final step involves training the RAG model using the prepared dataset. The model learns to understand the relationships between the inputs (table structures and summaries) and the outputs (SQL queries). This training process involves fine-tuning the model's parameters to minimize errors and optimize performance.

The Process:

Model Initialization: The RAG model is initialized with a pre-trained language model as its backbone.
Training Loop: The model iteratively processes the dataset, adjusting its parameters based on the accuracy of the generated SQL queries compared to the expected outputs.
Evaluation and Fine-Tuning: The model's performance is evaluated using a separate validation set. Fine-tuning adjustments are made to improve accuracy and generalization.

WIP

ExtractData

# Working as expected to scrape data and convert files into CSV format and folder
# to do: get all links in one go

import pandas as pd
from bs4 import BeautifulSoup
import requests
import os

# Define the folder path
folder_path = ''

url = 'https://docs.oracle.com/en/cloud/saas/financials/24c/oedmf/gljierrorcodes-25963.html'
# url = "https://docs.oracle.com/en/cloud/saas/financials/24c/oedmf/glledgers-25846.html"
# url = "https://docs.oracle.com/en/cloud/saas/financials/24c/oedmf/gljesourcestl-11208.html"

response = requests.get(url)

# if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')

paragrphas = soup.find_all('p')
text_data = [p.get_text() for p in paragrphas]
# print('text_data: ',text_data[2].split(' ')[0])
file_pre_fix = text_data[2].split(' ')[0]
combined_text = ' '.join(text_data)

folder_path = file_pre_fix
# Check if the folder exists, and if not, create it
if not os.path.exists(folder_path):
    os.makedirs(folder_path)
    print(f"Folder '{folder_path}' created.")
else:
    print(f"Folder '{folder_path}' already exists.")


with open(folder_path+'/'+file_pre_fix+'_Summary.txt', 'w', encoding='utf-8') as file:
    file.write(combined_text)

# print('done')

# Find all sections
sections = soup.find_all('section', class_='section')
section_title_list = []

for section in sections:
    try:
        # Extract section title
        section_title = section.find('h2', class_='title sectiontitle').get_text(strip=True)
        
        filename = f"{folder_path+'/'+file_pre_fix+'_'+section_title.replace(' ', '_')}.csv"
        print('filename:', filename)
        section_title_list.append(filename)

        # Find the table within the section
        table = section.find('table')

        # Process table data
        table_data = []
        headers = [header.get_text(strip=True) for header in table.find_all('th')]

        for row in table.find_all('tr')[1:]:
            cells = row.find_all('td')
            row_data = {headers[i]: cell.get_text(strip=True) for i, cell in enumerate(cells)}
            table_data.append(row_data)

        df = pd.DataFrame(table_data)
        # print(df)

        # Save to CSV
        df.to_csv(filename, index=False)

    except Exception as e:
        print('Error:', e)

Credits

Dhrumil Makadia

40 projects • 44 followers

Oracle ERP SQL Generator for BI Reports

Things used in this project

Hardware components

Software apps and online services

Story

Step 1: Data Extraction from Oracle Documentation

The Process:

Step 2: Building the Dataset for SQL Generation

The Process:

Step 3: Training the Model

The Process:

Code

ExtractData

Text To SQL

Credits

Dhrumil Makadia

Comments

Embed the widget on your own site

Oracle ERP SQL Generator for BI Reports

Oracle ERP SQL Generator for BI Reports

Things used in this project

Hardware components

Software apps and online services

Story

Step 1: Data Extraction from Oracle Documentation

The Process:

Step 2: Building the Dataset for SQL Generation

The Process:

Step 3: Training the Model

The Process:

Code

ExtractData

Text To SQL

Credits

Dhrumil Makadia

Comments