Applied Gen AI for Analytics, Data Science & Business
𝗜𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 𝘄𝗶𝘁𝗵 𝗖𝗵𝗮𝘁𝗚𝗣𝗧 𝗣𝗿𝗼𝗺𝗽𝘁𝘀:
𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗲𝘅𝗰𝗲𝗹 𝗳𝗶𝗹𝗲𝘀 : 𝗰𝗹𝗲𝗮𝗻𝘂𝗽, 𝗺𝗲𝗿𝗴𝗲, 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀
A very common analysis in the investment world is to track changes in equity portfolio allocations within a scheme over time. This often needs to be conducted monthly, and sometimes even weekly or fortnightly, across varying time periods.
This type of analysis requires merging multiple Excel files, typically print-formatted and may contain images, as was in this case.
Will ChatGPT be able to do it? With just prompts. In a replicable manner. Even if I provide files for different months / multiple months? That was the specific use case requested.
Amazingly: Yes
Had prepared a prompt for two files. That worked. Asked it to merge another 10 using same logic. And it did. In one shot. 30 sec. And for one of the files, the name format was also different...ChatGPT handled it seamlessly...!
See for yourself.
Prompt preparation secret: 𝗧𝗵𝗲 𝗺𝗮𝗶𝗻 𝗽𝗿𝗼𝗺𝗽𝘁 𝗺𝗶𝗴𝗵𝘁 𝗹𝗼𝗼𝗸 𝗳𝗮𝗻𝗰𝘆 𝗮𝗻𝗱 𝗶𝗻𝘁𝗶𝗺𝗶𝗱𝗮𝘁𝗶𝗻𝗴. 𝗜 𝗱𝗶𝗱 𝗻𝗼𝘁 𝘁𝘆𝗽𝗲 𝗼𝘂𝘁 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗹𝗶𝗻𝗲 𝗼𝗳 𝘁𝗵𝗮𝘁. I was voice typing using Google docs when doing for the first time. It was iterative. Used to dump the prompt with uuh, aaah, spellings, and horrible formatting into ChatGPT. At the end just asked ChatGPT to prepare a consolidated prompt that I can use, and it will understand. And voila.. the magic prompt...!
This prompt will only work for HDFC Top 100 Monthly Portfolio disclosures. That was the specific use case I was working on. Next, would be trying to generalize it, so that it can work for monthly portfolio disclosure of any scheme of any fund house. And publish as a GPT or a web app so that there is no need to worry about the prompt. And a user can simply upload files and proceed.
Monthly Portfolio - HDFC Mutual Fund (hdfcfund.com)
While doing it first time had to iterate with ChatGPT to get it right. It took me around an hour and a half to iterate and get the main prompt out. And after that it was just copy paste.
We need to validate, same as we would do had we processed the data with Excel / Python. For instance, validating the totals against main files. I had also asked ChatGPT to prepare a validation summary to make this a bit easier. And also instructed ChatGPT to apply these validation checks on its end.
And if you would rather do it in Python, then you can ask ChatGPT to provide a base code such that you can pass the file names, time periods etc. as variables....offering flexibility on data engineering side.
Main Prompt
[for any two-month period. If you need for multiple months, then just add a line with your instructions. See example below]
Mutual Fund GPT Instruction
**Detailed Prompt for Importing and Merging HDFC Mutual Fund Data:**
1. **File Specifications:**
- Use HDFC Mutual Fund data files for specific months (e.g., any two distinct months like November and December, or any other combination).
- Format: Excel files.
2. **Header and Data Rows:**
- Header located in the 5th row of the Excel file.
- Actual data begins from the 9th row.
3. **Data Filtering:**
- Include only rows where 'ISIN' starts with 'IN'.
- Exclude rows without 'ISIN' or where 'ISIN' does not start with 'IN'.
4. **Columns of Interest:**
- For each month, include only the following columns:
- 'ISIN'
- 'Name Of the Instrument'
- 'Quantity'
- 'Market/ Fair Value (Rs. in Lacs.)'
- '% to NAV'
- Convert 'Quantity', 'Market/ Fair Value (Rs. in Lacs.)', and '% to NAV' to numeric format.
5. **Merging Data:**
- Perform a full outer join on the 'ISIN' field.
- Ensure records from both provided months are included, even if they don't match in the other month.
- Merged file columns should be side-by-side for each month, formatted as:
- 'ISIN'
- 'Name Of the Instrument_{MonthYear}'
- 'Quantity_{MonthYear}'
- 'Market/Fair Value (Rs. in Lacs.)_{MonthYear}'
- '% to NAV_{MonthYear}'
- Insert two blank columns between the data for the two months.
- Example: 'Name Of the Instrument_Nov2023', 'Quantity_Dec2023', etc.
6. **Output Files:**
- Merged Data File (CSV):
- Contains the consolidated data from the provided months, following the merged format.
- Include only 'ISIN', 'Name', 'Quantity', 'Market Value', and '% of NAV' fields for each month.
- Validation File (CSV):
- Separate file containing validation totals.
- Include totals for 'Quantity', 'Market/Fair Value', and '% to NAV' for each month before merging.
- Include totals for the same fields from the merged file.
- Ensure totals match before and after merging to validate accuracy.
If you need for more than 2 months. Then just provide instructions as per your requirements and upload relevant files
I had already merged Nov and Dec 2023. And wanted it to do for Jan to Oct. So this was the prompt:
Now i will share 10 files from Jan to October 2023. Can you merge them similarly. Append those to the existing Nov and Dec dataset please.
2nd Prompt to get the analysis
The prompt below is assuming only two files have been uploaded. If you have uploaded multiple months then just provide instructions accordingly (see example below)
Analysis of Major Changes:
Calculate the changes in 'Quantity', 'Market/ Fair Value', and '% to NAV' between the two months.
Identify major increases and decreases in '% to NAV', defining 'major' as changes more than a certain threshold (e.g., 0.49%).
Highlight new entries added in the second month and entries that were present in the first month but not in the second.
Reporting:
Present the findings in a clear, table format.
Include tables showing major increases and decreases in '% to NAV', new entries in the second month, and dropped entries from the first month.
I had already done the analysis for Nov and Dec 2023. After processing for Jan to October, this is the prompt i used
Can you carry out the same analysis like earlier. Just that this time compare Feb 2023 vs. Dec 2023
3. Last Prompt to get python code
Can you share the python code for the full 12-month merge and for the validation file