Skip to content

navyagarwal/corp-filings-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Analysis in Retail: A Dive into Pandemic Impact

Introduction

For this project, I focussed on the four major retail corporations of the US: Target, Home Depot, Costco, and Lowe's.

The initial motivation was to extract information from the SEC 10-K filings and analyze financial trends pre, during, and post-pandemic.

However, the current progress of the project may not fully reflect the initial intentions. I faced challenges during extraction of financial metric values from filings due to computational power and time constraints.

Nevertheless, the project lays a solid groundwork for further work in this direction.

Why these insights?

Entire economies were caught off guard by the COVID-19 pandemic and faced serious financial consequences, it is reasonable to assume that pandemics are likely to persist.

Thus, studying financial trends in retail companies is particularly significant due to their direct connection to consumer behavior, economic trends, and market dynamics.

The performance of retail companies provide insights into both microeconomic factors affecting individual businesses and macroeconomic trends shaping the overall economy, making them crucial indicators for investors, policymakers, and analysts.

Tools and Technologies Used:

  • Programming Language: Python is my preferred development language. It is also suitable for this project because of its rich ecosystem of libraries available for data analysis, visualization and easy interface with various LLM APIs.
  • Deployment Platform: The primary reason I used Streamlit is because it allows very quick and easy deployment. I considered building a dashboard using Plotly Dash initially, and hosting the project on a free cloud hosting service, but that would have required more time than was available.
  • API for 10K Filing Retrieval: sec-edgar-downloader, as recommended, was initially my choice, but it proved computationally expensive as it downloaded all documents, some of which were hundreds of MBs. Handling storage and deleting temporary files became complicated. Then I tried SEC API, it seemed very promising initially, but I realized the next day that only the first 100 calls were free. Finally, I found sec-edgar-api (a wrapper on sec-edgar-downloader) that allowed storing files in memory instead of downloading to disk, which proved very helpful. Ultimately, this was the one I used.
  • LLM Inference API: Initially, I experimented with numerous Hugging Face models like Facebook BART CNN, Roberta Base Squad 2, Distilbart CNN among many others, for summarization and question-answering type tasks. However, the results were suboptimal, largely because the models weren't trained on finance-specific data. Other Hugging Face models that were pretrained on financial data either had a very small context-window, or were better suited for Sentiment Analysis based applications (which wasn't my objective). Next, I attempted to use the OpenAI API, but encountered issues with my API key, likely requiring purchasing credits. Eventually, I turned to the Google Gemini API, which yielded good results, and I didn't look back.
  • Tools for Text Preprocessing: Text preprocessing was a labourious, involving extensive regex, BeautifulSoup, and parsing to extract required section items from filings and converting them from HTML to text format.

Screen recording of the project

(The page takes around 3 to 4 minutes to load completely, this video has been sped up in certain places)

corp-filings-analysis_screencast.mp4

Challenges Encountered:

  • Insight Extraction: Extraction of useful information, especially financial metric figures and pandemic-related trends proved to be a challenge. LLMs pretrained on relevant data combined with better text pre-processing (like extraction of tabular data) would give better results.
  • Computational Resource Limitations: Since I was relying on free credits and compute power, processing large datasets and complex analysis was challenging due to resource constraints. The page loading time could be reduced by introducing parallel programming to make several API calls at once.
  • Time Constraints: The time that I could dedicate to the project was constrained which impacted the quality of analysis achieved.

Future Work

Although the current state of the project does not fully reflect the initial objectives, there is significant potential for future work in this direction. With additional resources and time, further analysis could provide valuable insights into the financial performance of these retail giants across different periods, especially in response to significant events like the COVID-19 pandemic.

Releases

No releases published

Packages

No packages published

Languages