introduction of numered sections in all yamls.

general repo setructuring.
eea · Dec 22, 2024 · 7dc84fe · 7dc84fe
1 parent 4c855a0
commit 7dc84fe
Show file tree

Hide file tree

Showing 13 changed files with 2,554 additions and 113 deletions.
diff --git a/CLMS_documents.Rproj b/CLMS_documents.Rproj
@@ -0,0 +1,13 @@
+Version: 1.0
+
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: Sweave
+LaTeX: pdfLaTeX
diff --git a/CheatSheet.ipynb → ...raining and Web Crawlers/CheatSheet.ipynb b/CheatSheet.ipynb → ...raining and Web Crawlers/CheatSheet.ipynb
diff --git a/document_guildelines/CheatSheet.qmd → ... Training and Web Crawlers/CheatSheet.qmd b/document_guildelines/CheatSheet.qmd → ... Training and Web Crawlers/CheatSheet.qmd
@@ -7,6 +7,7 @@ sitemap: true
 toc: true
 toc-title: "Contents"
 toc-depth: 3
+number-sections: true
 keywords: ["AI standards", "web crawlers", "AI training", "content formatting"]
 format: 
   html: default
@@ -16,17 +17,17 @@ format:
 
 This document serve as a quick reference guide to ensure content follows structured formats essential for web crawlers and AI systems. Utilizing Quarto Markdown in HTMLs and generating sitemaps are critical for efficient crawling, helping search engines and AI models quickly index and retrieve well-structured content.
 
-# 1. Introduction
+# Introduction
 
-## 1.1. Importance of Structured Data for AI and Web Crawlers
+## Importance of Structured Data for AI and Web Crawlers
 
 Generative AI and chatbots rely heavily on structured data to provide meaningful and accurate responses. For these systems to operate efficiently, they need access to data that is easy to index, retrieve, and process. Properly formatted content enables web crawlers and AI models to efficiently access and retrieve data, improving the accuracy of results provided to users.
 
 Web crawlers, also known as bots or spiders, index web content by following hyperlinks. They require well-structured content, often formatted in HTML, with clear metadata to ensure content is discoverable and up-to-date for search engines and AI systems.
 
 ------------------------------------------------------------------------
 
-## 1.2. Goals of Content Standardization
+## Goals of Content Standardization
 
 -   **Improved Data Access**: Ensuring web crawlers and AI models can easily access structured data.
 -   **Enhanced Search Engine Optimization (SEO)**: Well-formatted content improves visibility and accessibility across search engines.
@@ -35,16 +36,16 @@ Web crawlers, also known as bots or spiders, index web content by following hype
 
 ------------------------------------------------------------------------
 
-## 1.3. Benefits of Sitemaps and Metadata
+## Benefits of Sitemaps and Metadata
 
 -   **Sitemaps**: Provide a roadmap for web crawlers to discover all content. A well-structured sitemap enhances a crawler’s efficiency, ensuring that content is indexed properly.
 -   **Metadata**: Metadata improves the discoverability and accuracy of content retrieval. Metadata tags such as title, author, date, and description help crawlers and AI models understand the content’s structure and relevance.
 
 ------------------------------------------------------------------------
 
-# 2. Content Standards for AI and Web Crawlers
+# Content Standards for AI and Web Crawlers
 
-## 2.1. Content Structuring in Quarto Markdown
+## Content Structuring in Quarto Markdown
 
 Quarto Markdown provides an efficient way to structure content for generative AI and web crawlers. Use clear headings, subheadings, and metadata to help web crawlers navigate the content.
 
@@ -62,7 +63,7 @@ sitemap: true
 
 ------------------------------------------------------------------------
 
-## 2.2. HTML Structuring for Web Crawlers
+## HTML Structuring for Web Crawlers
 
 Semantic HTML5 elements, such as `<article>`, `<section>`, and `<header>`, help web crawlers index and understand the content more efficiently.
 
@@ -81,7 +82,7 @@ Semantic HTML5 elements, such as `<article>`, `<section>`, and `<header>`, help
 ---
 ```
 
-### 2.2.1. Microdata for Structured Content
+### Microdata for Structured Content
 
 ``` yaml
 ---
@@ -96,7 +97,7 @@ Semantic HTML5 elements, such as `<article>`, `<section>`, and `<header>`, help
 
 ------------------------------------------------------------------------
 
-## 2.3. PDF Structuring for AI Integration
+## PDF Structuring for AI Integration
 
 For documents in PDF format, ensure proper tagging of sections and headings to improve readability and indexing by crawlers and AI models. Add relevant metadata to the document properties.
 
@@ -110,7 +111,7 @@ keywords: ["AI", "web crawlers", "PDF"]
 
 ------------------------------------------------------------------------
 
-## 2.4. HTML Structuring for AI Integration
+## HTML Structuring for AI Integration
 
 To optimize content for AI integration, HTML documents should include semantic elements, structured data formats like JSON-LD, and relevant metadata. This helps AI systems process and train on the content efficiently.
 
@@ -143,7 +144,7 @@ To optimize content for AI integration, HTML documents should include semantic e
 
 ------------------------------------------------------------------------
 
-# 3. Importance of Sitemap Indexing in HTML Documents
+# Importance of Sitemap Indexing in HTML Documents
 
 Sitemaps are essential for enhancing the discoverability and accessibility of web content for both web crawlers and AI systems. As an XML file, a sitemap provides a structured roadmap of a website, listing URLs, metadata, and details like last modified dates and update frequency. This helps crawlers efficiently index content and enables generative AI models to train on well-structured data, improving processing and retrieval accuracy. Key Benefits of Sitemap Indexing for Web Crawling and AI Training are:
 
@@ -172,7 +173,7 @@ Submit your sitemap to search engines via tools like Google Search Console to en
 
 ------------------------------------------------------------------------
 
-# 4. Best Practices for Information Formatting
+# Best Practices for Information Formatting
 
 -   **Consistent Metadata:** Use uniform metadata (title, author, description, keywords) across all documents.
 
@@ -184,7 +185,7 @@ Submit your sitemap to search engines via tools like Google Search Console to en
 
 ------------------------------------------------------------------------
 
-# 5. Quarto Markdown Editors
+# Quarto Markdown Editors
 
 To work with Quarto Markdown (.qmd) files and have them generated automatically, we can use several editors that integrate well with Quarto. VS Code (Visual Studio Code), RStudio, JupyterLab with Quarto Integration, and Atom with Quarto Plugin  are some popular editors that support Quarto and can automatically generate .qmd files. 
 
@@ -210,7 +211,7 @@ R-Studio is lightweight, easy-to-use and integrates with Quarto and provides too
 
 ------------------------------------------------------------------------
 
-# 6. Automation with GitHub Deployment
+# Automation with GitHub Deployment
 
 Automation is crucial for ensuring efficiency and consistency in the deployment of content structured for AI integration and web crawlers. By automating the rendering of Quarto Markdown, Markdown, and Jupyter Notebook files into HTML, generating a sitemap, and deploying the output to GitHub Pages, the process becomes seamless and repeatable with minimal human intervention. This ensures that any changes to content are instantly reflected on the website, keeping the content discoverable and up-to-date for web crawlers and AI systems. Steps in the Automation Pipeline are:
 
@@ -231,7 +232,7 @@ g.  **Deploy to GitHub Pages**:
 
 ------------------------------------------------------------------------
 
-# 6. Conclusion
+# Conclusion
 
 Standardizing content formatting using Quarto Markdown, HTML5, and sitemaps is essential for enabling effective web crawling and AI training. Structured data ensures improved discoverability, faster indexing, and better accessibility, supporting the development of more accurate and responsive AI models.