Profile PictureAutonomous Econ
$0+

LLM-powered Text to Dataset Creator

0 ratings
Add to cart

LLM-powered Text to Dataset Creator

$0+
0 ratings

Text-based datasets are used for a variety of applications in Economics and Finance: building indicators of sentiment from news stories, pricing shares from investor reports, and interpreting the tone of Fed announcements.

However, creating structured text-based datasets from a large number of documents or web pages can be painstaking. One might need to design algorithms with complex rules and keyword lists. Even then, such algorithms can often fail to understand context.

The result: is a poor-quality dataset for your indicator or model.

This guide provides step-by-step instructions on how you can leverage large language models (LLMs) like ChatGPT to extract information from documents and store the output in a structured CSV.

The final output.

I will also show you how to add additional features to the dataset simply via plain-English instructions. Using announcements from Central Banks as an example, I'll illustrate how we can automate the classification of statements on a scale from very dovish (-1) to very hawkish (1).

Included in this guide are two notebook templates that you can follow interactively in your browser with Google Colab:

  • Text2Data webloader template: use an LLM to extract data from a list of URLs.
  • Text2Data PDF loader template: use an LLM to extract and analyze information from an entire directory of PDFs.

Download the templates and start creating your own unique datasets today!

$
Add to cart

A python template on how to use the Langchain module together with OpenAI models to create structured datasets from text.

Included: quick start guide on Google Colab for absolute beginners.
Copy product URL