extract rules for using LLM, and use it for non-ai #448
-
Hi, can I extract information from source(crawling) by providing model(product_name, price etc), get result, and extract its css schema, for reusing for other pages in nonai - css manner ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
@AzizNadirov Let me try to unpack your question here. You expect to provide the product_name, price etc and you expect the model/algorithm to map the html id/class name etc of the divs to these specific data fields. |
Beta Was this translation helpful? Give feedback.
@AzizNadirov To best of my knowledge, we currently don't have a feature like this. We will certainly keep this use case in mind while planning our future roadmap.
However you can get the raw html from the crawler result using
result.html
then have a model(like chatGPT or Claude) to workout the mapping between classes/id/name etc attributes of divs vs desired data fields. Then you can extract using the help ofJsonCssExtractionStrategy
orJsonXPathExtractionStrategy
.You can find some useful examples here
Cc: @unclecode Interesting use case ☝🏼