Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide generation. #6

Open
lastrosade opened this issue Apr 25, 2024 · 4 comments
Open

Guide generation. #6

lastrosade opened this issue Apr 25, 2024 · 4 comments

Comments

@lastrosade
Copy link

When generating the URLs, generate a website description and use that description to guide the generation of the web page.
Consider using GBNF for this.

@Zetaphor
Copy link

In case like me anyone else isn't familiar with that acronym:
https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md

@Kreevoz
Copy link

Kreevoz commented Apr 28, 2024

I experimented with this by allowing for parameters to be part of all URLs, and then telling llama3 to append two of them. It works very well:

Put this into the prompt for page generation (as well as for search results, or make it a system prompt).
(Of course you will have to alter the url parsing a little to keep + strip out the parameters for further use also!)

Generate a webpage from the fictional site of '{url}' at the resource path of '{path}' with parameters: '{params}'. Make sure all links generated either link to an external website, or if they link to another resource on the current website, they have the current url prepended ({url}) to them. Append the parameter '?&description=(short summary of the linked webpage here that describes the content or purpose)' to all generated URLs. Also append the parameter '&previous-webpage=(short summary of the current website that the link appears on)' as final parameter. These parameters help you to figure out what to generate, so you must generate them on each link. If there are other parameters needed, make sure to combine them. Here is an example of a finished link: '<a href=\"http://www.flower-website.com/?parameter1=cart&description=Shopping cart of the town's best flower shop website&previous-webpage=Merchant directory, flower shop subpage\">Link title here</a>' Update the previous-webpage parameter to match the currently generated webpage.

That way you also get pages generated that roughly match what the fake search engine spits out, and they are thematically grouped.

Probably only a band-aid and could be implemented in a more elegant way, but it is easy enough to do like this.

@scalar27
Copy link

Great idea. I added it and it does help a lot for the coherence. However, I see even more of the problem where the generated links on the following pages do not have the 127.0.0.1:5000 prepended to it. Is there a way to fix this to ensure all links get that?

@Kreevoz
Copy link

Kreevoz commented Apr 29, 2024

@scalar27 🤔 I just serve the thing on localhost on port 80 and thereby no port is required and links just all work without any hassle.
You can tell flask which port to use in the main.py , like so

if __name__ == "__main__":
    app.run(host='127.0.0.1', port=80, debug=False)
    print(engine.export_internet())

Alternatively I suppose you could append the port to all hrefs by modifying the _format_page function in the ReaperEngine.py file to include that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants