Awesome Stars

A curated list of my GitHub stars! Generated by starred.

C

JoeDog/siege - Siege is an http load tester and benchmarking utility
mean00/avidemux2 - Avidemux2, simple video editor
dericed/american-archive-kaldi - This repo houses open-source models for Kaldi speech-to-text software that have been trained on public media content.
kaltura/nginx-vod-module - NGINX-based MP4 Repackager
gpac/gpac - GPAC Ultramedia OSS for Video Streaming & Next-Gen Multimedia Transcoding, Packaging & Delivery
x42/libtimecode - deal with A/V timecode and framerates
x42/ltc-tools - tools to deal with linear-timecode (LTC)
x42/libltc - Linear/Logitudinal Time Code (LTC) Library
sandflow/ffmpeg-imf - Adds an IMF demuxer to FFMPEG (https://github.com/sandflow/ffmpeg-imf/blob/develop/README-IMF.md)
ggreer/the_silver_searcher - A code-searching tool similar to ack, but faster.
sebastiencs/ls-icons - ls command with files icons
setmind/sacd-ripper - Improved sacd_extract
FFmpeg/FFmpeg - Mirror of https://git.ffmpeg.org/ffmpeg.git

C#

SubtitleEdit/subtitleedit - the subtitle editor :)
mlichtenberg/hocrimagemapper - Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).

C++

yandex/perforator - Perforator is a cluster-wide continuous profiling tool designed for large data centers
spotify/annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
aous72/OpenJPH - Open-source implementation of JPEG2000 Part-15 (or JPH or HTJ2K)
microsoft/BitNet - Official inference framework for 1-bit LLMs
intel/neural-speed - An innovative library for efficient LLM inference via low-bit quantization
google/gemma.cpp - lightweight, standalone C++ inference engine for Google's Gemma models.
RWKV/rwkv.cpp - INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
aarnphm/whispercpp - Pybind11 bindings for Whisper.cpp
ml-explore/mlx - MLX: An array framework for Apple silicon
ggerganov/llama.cpp - LLM inference in C/C++
ggerganov/whisper.cpp - Port of OpenAI's Whisper model in C/C++
Mozilla-Ocho/llamafile - Distribute and run LLMs with a single file.
shundhammer/qdirstat - QDirStat - Qt-based directory statistics (KDirStat without any KDE - from the original KDirStat author)
3ximus/md5-collisions - MD5 collision testing
IMFTool/IMFTool - A tool for editing IMF CPLs and creating new versions of an existing IMF (Interoperable Master Format) package
mkrufky/node-dvbtee - MPEG2 transport stream parser for Node.js with support for television broadcast PSIP tables and descriptors
mkrufky/libdvbtee - dvbtee: a digital television streamer / parser / service information aggregator supporting various interfaces including telnet CLI & http control
mipops/dvrescue - Archivist-made software that supports data migration from DV tapes into digital files suitable for long-term preservation. Snapshot daily builds are at https://mediaarea.net/download/snapshots/binary/
MediaArea/RAWcooked - Encodes RAW audio-visual data into the Matroska container (MKV), using the video codec FFV1 for the image and audio codec FLAC for the sound.

CSS

logankilpatrick/gemini-api-quickstart - Get up and running in under 5 minutes with the Google AI Gemini API (in Python)
timpaul/form-extractor-prototype - A prototype of a tool that generates web forms from document forms
aravindputrevu/app-search-flask-app - This is an example of a Python Flask app with Elasticsearch/ Elastic App Search with respective Python Clients
IIIF/cookbook-recipes - For working on the recipes

Cython

explosion/thinc-apple-ops - 🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Dockerfile

nytimes/nginx-vod-module-docker - Docker image for nginx with Kaltura's VoD module used by The New York Times

Go

1Panel-dev/1Panel - 🔥 Top-Rated Web-Based Linux Server Management Tool. 1Panel features an intuitive web interface that seamlessly integrates server management and monitoring, container management, database administratio
mostlygeek/llama-swap - transparent proxy server for llama.cpp's server to provide automatic model swapping
kgretzky/evilginx2 - Standalone man-in-the-middle attack framework used for phishing login credentials along with session cookies, allowing for the bypass of 2-factor authentication
MightyMoud/sidekick - Bare metal to production ready in mins; your own fly server on your VPS.
gotenberg/gotenberg - A developer-friendly API for converting numerous document formats into PDF files, and more!
ollama/ollama - Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
mikefarah/yq - yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
hillu/local-log4j-vuln-scanner - Simple local scanner for vulnerable log4j instances
anchore/syft - CLI tool and library for generating a Software Bill of Materials from container images and filesystems
anchore/grype - A vulnerability scanner for container images and filesystems
johnkerl/miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

HCL

eyelevelai/groundx-on-prem - A Kubernetes deployable instance of GroundX for document parsing, storage, and search.

HTML

wjbmattingly/flask-annoy -
microsoft/markitdown - Python tool for converting files and office documents to Markdown.
swyxio/ai-notes - notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under
CatalogueLegacies/antconc.github.io - Computational Analysis of Catalogue Data
internetarchive/Zeno - State-of-the-art web crawler 🔱
simonw/tools - Assorted tools
Unstructured-IO/unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
pdf2htmlEX/pdf2htmlEX - Convert PDF to HTML without losing text or format.
coolwanglu/pdf2htmlEX - Convert PDF to HTML without losing text or format.
kmurmur/embARC -
tesseract-ocr/tessdoc - Tesseract documentation
GLAM-Workbench/glam-workbench.github.io -
LibraryOfCongress/embARC - embARC (“metadata embedded for archival content”) manages internal file metadata including embedding and validation. Created by FADGI (Federal Agencies Digital Guidelines Initiative), in conjunction w
bitmovin/bitmovin-player-web-samples - Showcases build around the Bitmovin Adaptive Streaming Player, demonstrating usage and capabilities of the HTML5 based HLS and MPEG-DASH player, as well as the Flash based Fallback.
ColorlabMD/DPX_Metadata_Editor - View, Edit and Modify DPX file headers
bfidatadigipres/bfidatadigipres.github.io -
bfi-prog-notes/bfi-prog-notes.github.io -
KBNLresearch/iromlab - Loader software for automated imaging of optical media with Nimbie disc robot
IIIF-Commons/biiif-cli - A CLI to Build Static IIIF Collections
TheScienceMuseum/collection-chrome-extension - Museum in a Tab: A Chrome Browser extension showing objects from the Science Museum Group Collection
krzemienski/awesome-video - A curated list of awesome streaming video tools, frameworks, libraries, and learning resources.
kba/hocrjs - Working with hOCR in Javascript
algorythmik/python-hocr - HOCR parsing
archival-IIIF/archival-iiif.github.io - Website

Haskell

facebook/duckling - Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

Java

OCR4all/OCR4all - Provides OCR (Optical Character Recognition) services through web applications
Stirling-Tools/Stirling-PDF - #1 Locally hosted web application that allows you to perform various operations on PDF files
kestra-io/kestra - ⚡ Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
Netflix/maestro - Maestro: Netflix’s Workflow Orchestrator
kermitt2/grisp - Knowledge Base stuff
kermitt2/grobid - A machine learning software for extracting information from scholarly documents
kermitt2/entity-fishing - A machine learning tool for fishing entities
stanfordnlp/CoreNLP - CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
DSRCorporation/imf-conversion - NF IMF media conversion utility allows to handle flat file creation from a specified CPL within the IMF package
Netflix/photon - Photon is a Java implementation of the Interoperable Master Format (IMF) standard. IMF is a SMPTE standard whose core constraints are defined in the specification st2067-2:2013
DSpace/DSpace - (Official) The DSpace digital asset management system that powers your Institutional Repository
LibraryOfCongress/bagger - The Bagger application packages data files according to the BagIt specification.
usnationalarchives/File-Analyzer - NARA File Analyzer and Metadata Harvester
Georgetown-University-Libraries/File-Analyzer - A Data Parsing/Data Manipulation Tool Supporting Digitization Projects and Other Data Analysis Projects
atduskgreg/opencv-processing - OpenCV for Processing. A creative coding computer vision library based on the official OpenCV Java API
archivist-liz/jhove - File validation and characterisation.
apache/incubator-stormcrawler - A scalable, mature and versatile web crawler based on Apache Storm

JavaScript

lucide-icons/lucide - Beautiful & consistent icon toolkit made by the community. Open-source project and a fork of Feather Icons.
elasticsearch-dump/elasticsearch-dump - Import and export tools for elasticsearch & opensearch
ToolJet/ToolJet - Low-code platform for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, Google sheets, OpenAI, etc and build apps using drag and drop application
jhuckaby/performa-satellite - Remote data collector for Performa.
edsu/whisper-transcript - A Lit web-component for viewing a Whisper JSON transcript file
NginxProxyManager/nginx-proxy-manager - Docker container for managing Nginx proxy hosts with a simple, powerful interface
RahulSChand/gpu_poor - Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
gchq/CyberChef - The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
jhuckaby/performa - A multi-server monitoring system with a web based UI.
alexpinel/Dot - Text-To-Speech, RAG, and LLMs. All local!
Mintplex-Labs/anything-llm - The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
datasette/datasette-extract - Import unstructured data (text and images) into structured tables
HumanSignal/label-studio - Label Studio is a multi-type data labeling and annotation tool with standardized output format
fchollet/ARC-AGI - The Abstraction and Reasoning Corpus
marco-bertelli/medium-rag-frontend - Rag Chatbot React And Tyepscript base boilerplate
open-webui/open-webui - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
louislam/uptime-kuma - A fancy self-hosted monitoring tool
britishlibrary/peripleo-lanc -
eslawski/react-iiif-viewer - A React component for displaying high resolution IIIF images with deep zooming capabilities on mobile and desktop.
appbaseio/reactivesearch - Search UI components for React and Vue
betagouv/react-elasticsearch - 🛁 React + Elasticsearch - UI components for building data-driven search experiences
bradtraversy/feedback-app - React feedback app from React course
varunshenoy/GraphGPT - Extrapolating knowledge graphs from unstructured text using GPT-3 🕵️‍♂️
digipres/awesome-digital-preservation - Carefully curated list of awesome digital preservation resources.
thiagopnts/clappr-video360 - 360 video plugin for Clappr
tjenkinson/clappr-thumbnails-plugin - A plugin for clappr which will display thumbnails when hovering over the scrub bar. Thumbnails can either be individual images or a sprite sheet.
clappr/clappr - 🎬 An extensible media player for the web.
transitive-bullshit/ffmpeg-extract-frames - Extracts frames from a video using ffmpeg.
transitive-bullshit/ffmpeg-generate-video-preview - Generates an attractive image strip or GIF preview from a video.
cookpete/react-player - A React component for playing a variety of URLs, including file paths, YouTube, Facebook, Twitch, SoundCloud, Streamable, Vimeo, Wistia and DailyMotion
samvera-labs/ramp - Interactive, IIIF powered audio/video media player React components library. Styleguidist Docs: https://samvera-labs.github.io/ramp/
digirati-co-uk/canvas-panel - Prototype covering the specification of Canvas Panel, and supporting components for composing bespoke IIIF viewers and lightweight experiences, conforming to the IIIF Presentation 3 specification.
digirati-co-uk/timeliner - IIIF Timeliner
amnh-sciviz/collectionscope -
elastic/app-search-reference-ui-react - A generic UI for use with any App Search Engine
art-institute-of-chicago/aic-mirador-ui - A Mirador plugin for UI customizations
glenrobson/SimpleAnnotationServer - A simple IIIF and Mirador compatible Annotation Server
atomotic/iiif-annotation-studio - Mirador IIIF Viewer packaged as a desktop app with an embedded annotation endpoint
ProjectMirador/mirador-desktop - A desktop wrapper for Mirador and its environment, allowing use of local images.
o19s/pdf-discovery-demo - Demonstration of searching PDF document with Solr, Tika, and Tesseract
mozilla/pdf.js - PDF Reader in JavaScript
EIDR-ID/reshuffle-prod-runtime - Reshuffle Enterprise Production-Only (no studio sync) Runtime Environment
greenstick/interactor - Front-End Code for Tracking Interactions and Conversions on Websites.
phivk/nonlinearvideo - Non-Linear Video in HTML5 Workshop
cpietsch/vikus-viewer - Explore cultural collections along time, texture and themes
rwhscott/uv-hello-world - Fork of UniversalViewer/uv-hello-world that incorporates the manifest selection functionality from UniversalViewer/examples.
SatadruBhattacharjee/react-tv-epg - A HTML5 Canvas based EPG(TV Guide) React Component for TV and Set-top box
TheScienceMuseum/entities-search-engine - Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date
europeana/media-player - Media player developed under the Europeana Media Generic Services Project
mejackreed/mirador-plugin-example -
ProjectMirador/mirador-annotations - a Mirador 3 plugin that adds annotation creation tools to the user interface
dbmdz/mirador-textoverlay - Text Overlay plugin for Mirador 3
ProjectMirador/mirador - An open-source, web-based 'multi-up' viewer that supports zoom-pan-rotate functionality, ability to display/compare simple images, and images with annotations.
UB-Mannheim/ocr-fileformat - Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
internetarchive/bookreader - The Internet Archive BookReader
aeschylus/IIIFBookReader - A plugin for the Internet Archive BookReader that enables easy book viewing on top of a IIIF-compatible back end.

Jupyter Notebook

wjbmattingly/youtube-shakespeare -
wjbmattingly/youtube-spacy-ml -
wjbmattingly/keyword-spacy - Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.
wjbmattingly/spacy-whisper - A spaCy pipeline designed to work with whisper outputs.
wjbmattingly/spcay-aligner - A component to identify and align person entities.
wjbmattingly/ocr_python_textbook -
wjbmattingly/youtube-florence-table - Table detection with Florence.
wjbmattingly/python_for_dh -
wjbmattingly/youtube-spacy-layout - A quick tutorial for using spaCy Layout.
DAMO-NLP-SG/VideoLLaMA3 - Frontier Multimodal Foundation Models for Image and Video Understanding
congruence-engine/catalogues-as-data - Repository for the 'Museums online catalogue-as-data' investigation
congruence-engine/retrieval-augmented-generation-with-circulars - Repo for code and data related to the CE investigation into creating a searchable repository from digitised GPO circulars
congruence-engine/experimenting-with-optical-character-recognition - Repository on a series of Experimentations with Optical Character Recognition
congruence-engine/universal-ner-with-gliner - Repository of the "Universal NER with GLiNER" investigation
patchy631/ai-engineering-hub -
ALucek/multimodal-rag -
AI4LAM/fastai4GLAMS - A study group for v4 of the fastai introduction to deep learning course with a focus on applications in GLAM settings
huggingface/smol-course - A course on aligning smol models.
meta-llama/llama-cookbook - Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model f
merveenoyan/siglip - Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
deepfates/memery - Search over large image datasets with natural language and computer vision!
microsoft/generative-ai-for-beginners - 21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
DataTalksClub/llm-zoomcamp - LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.
iyaja/llama-fs - A self-organizing file system with llama 3
alexfazio/crewAI-quickstart - A collection of notebooks, cookbooks, and recipes showcasing fun and effective ways to use CrewAI's agentic workflow implementations and tools.
google-gemini/cookbook - Examples and guides for using the Gemini API
WhisperSpeech/WhisperSpeech - An Open Source text-to-speech system built by inverting Whisper.
mlabonne/llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
mshumer/ai-journalist -
MikeChan-HK/Gemini-agent-example - An examples code to make langchain agents without openai API key (Google Gemini), Completely free unlimited and open source, run it yourself on website. Ready to support ollama.... (Update when i am f
Jl16ExA/Surya-OCR-Hardware-Benchmarking - Surya-OCR-Hardware-Benchmarking is a repository dedicated to evaluating and analyzing the performance of the Surya OCR model across different hardware configurations. It provides tools and scripts for
nateraw/openai-vision-api-for-videos - Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦
poloclub/unitable - UniTable: Towards a Unified Table Foundation Model
google-research/vision_transformer -
weaviate/recipes - This repository shares end-to-end notebooks on how to use various Weaviate features and integrations!
anthropics/anthropic-cookbook - A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
LearnToCode180/Entity-Fishing-Tutorial - Entity Linking of text mentions with Wikidata entries using a tool called Entity Fishing.
yandexdataschool/nlp_course - YSDA course in Natural Language Processing
NousResearch/Hermes-Function-Calling -
MahmoudAshraf97/whisper-diarization - Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
video-db/PromptClip - Instantly create video clips from LLM prompts
brevdev/notebooks - Collection of notebook guides created by the Brev.dev team!
aigeek0x0/rag-with-langchain-colbert-and-ragatouille - Build a Streamlit Chatbot using Langchain, ColBERT, Ragatouille, and ChromaDB
rohan-paul/LLM-FineTuning-Large-Language-Models - LLM (Large Language Model) FineTuning
snexus/llm-search - Querying local documents, powered by LLM
distant-viewing/dvt - Distant Viewing Toolkit for the Analysis of Visual Culture
Macuyiko/royal-navy-ship-identification - This repository contains the source code accompanying the paper "Explainable Deep Learning to Classify Royal Navy Ships"
Vaibhavs10/how-to-whisper -
philschmid/document-ai-transformers -
sanchit-gandhi/whisper-jax - JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
NielsRogge/Transformers-Tutorials - This repository contains demos I made with the Transformers library by HuggingFace.
salesforce/LAVIS - LAVIS - A One-stop Library for Language-Vision Intelligence
facebookresearch/seamless_communication - Foundational Models for State-of-the-Art Speech and Text Translation
run-llama/llama-hub - A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
sroecker/LLM_AppDev-HandsOn - Repository and hands-on workshop on how to develop applications with local LLMs
Vaibhavs10/insanely-fast-whisper -
langchain-ai/langchain - 🦜🔗 Build context-aware reasoning applications
leandromoreira/digital_video_introduction - A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: 🇺🇸 🇨🇳 🇯🇵 🇮🇹 🇰🇷 🇷🇺 🇧🇷 🇪🇸
GLAM-Workbench/facial-detection -
shawngraham/Identifying-Similar-Images-with-TensorFlow-notebooks -
cpietsch/vikus-viewer-script - Scripts to generate sprite sheets and textures for VIKUS Viewer
TheScienceMuseum/heritage-connector - Heritage Connector: Transforming text into data to extract meaning and make connections

Kotlin

bfidatadigipres/bfi-iiif-logging - Solution for BFI National archive Universal Viewer deployment, to log users in, track their interactions with the IIIF resources in UV, and output to a log.
digirati-co-uk/bfi-discovery - Prototyping, discovery and documentation for the BFI viewer project.

Lua

awesomeWM/awesome - awesome window manager
Kong/kong - 🦍 The Cloud-Native API Gateway and AI Gateway.

Others

congruence-engine/experimenting-wikidata - Repository for the "Linking Textile Machine terms via Wikidata" investigation
congruence-engine/transforming-researcher-notes -
ml-tooling/best-of-ml-python - 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
AI4LAM/full-stack-deep-learning-4-glams - A study group for the Full Stack Deep Learning Course with a focus on using ML in GLAM settings
AI4LAM/TeachingAndLearning - A repository to organize materials from the AI4LAM Teach and Learning Working Group
fr0gger/Awesome-GPT-Agents - A curated list of GPT agents for cybersecurity
meta-llama/llama-stack-apps - Agentic components of the Llama Stack APIs
arpitingle/gpu-alpha - High Quality Resources on GPU Programming/Architecture
BradyFU/Video-MME - ✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
RManLuo/Awesome-LLM-KG - Awesome papers about unifying LLMs and KGs
hrishioa/tough-llm-tests - Some tough questions to test new models.
watson/awesome-computer-history - An Awesome List of computer history videos, documentaries and related folklore
archivematica/Issues - Issues repository for the Archivematica project
LvanWissen/starred -
linexjlin/GPTs - leaked prompts of GPTs
1mrat/gpt-stats - Stats for Custom Chat GPTs not created by OpenAI
travistangvh/ChatGPT-Data-Science-Prompts - A repository of 60 useful data science prompts for ChatGPT
brillout/awesome-react-components - Curated List of React Components & Libraries.
kdeldycke/awesome-falsehood - 😱 Falsehoods Programmers Believe in
transitive-bullshit/awesome-ffmpeg - 👻 A curated list of awesome FFmpeg resources.
alebcay/awesome-shell - A curated list of awesome command-line frameworks, toolkits, guides and gizmos. Inspired by awesome-php.
herrbischoff/awesome-command-line-apps - 🐚 Use your terminal shell to do awesome things.
gchq/BoilingFrogs - GCHQ's internal Boiling Frogs research paper on software development and organisational change in the face of disruption #boilingfrogs
nationalarchives/tdr-dev-documentation - Documentation for developers for the TDR project
IIIF/iiif-av - The International Image Interoperability Framework (IIIF) Audio/Visual (A/V) Technical Specification Group aims to extend to A/V the benefits of interoperability and the growing ecosystem of clients a
keshavbhatt/WonderWall-Packaging - Wonderwall Wallpaper manager, releases for Linux and Windows 10
usnationalarchives/digital-preservation - NARA digital preservation file format risk analysis and preservation plans
EIDR-ID/php - EIDR applications and source code examples written in PHP.
EIDR-ID/python - EIDR applications and source code examples written in Python.
ProjectMirador/mirador-awesome - An awesome list for Mirador's projects and plugins.
IIIF/awesome-iiif - Awesome IIIF-related resources
ncarboni/awesome-GLAM-semweb - A curated list of various semantic web and linked data resources for heritage, humanities and art history practitioners.
kba/awesome-ocr - Links to awesome OCR projects
MeMAD-project/mmca - MeMAD multimodal content analysis and machine translation: collection of tools and libraries
MeMAD-project/interchange-formats - MeMAD Metadata Interchange Formats
bnb/awesome-hyper - 🖥 Delightful Hyper plugins, themes, and resources
exponential-decay/pronom-archive-and-skeleton-test-suite - Release repository for The Skeleton Test Suite. Contains an Archive of PRONOM, and skeleton files for testing DROID from The National Archives, UK.
ross-spencer/brainscape-digital-preservation - An open source set of decks for learning about digital preservation.

PHP

passbolt/passbolt_api - Passbolt Community Edition (CE) API. The JSON API for the open source password manager for teams!
exponential-decay/the-format-registry - A mirror of the PRONOM file format registry in Linked Open Data format. The Format Registry is a linked (open) data file format repository. The work is the result of a four-day hack during November 20

Pascal

double-commander/doublecmd - Double commander, A twin panel (side by side) cross platform open source file manager

Perl

get-iplayer/get_iplayer - A utility for downloading TV and radio programmes from BBC iPlayer and BBC Sounds

Python

epuerta9/deep-research-py - save 200 a month and use deep research right in your terminal. - port of https://github.com/dzhng/deep-research but in python
OpenGVLab/InternVideo - [ECCV2024] Video Foundation Models & Data for Multimodal Understanding
lm-sys/FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Textualize/rich - Rich is a Python library for rich text and beautiful formatting in the terminal.
zauberzeug/nicegui - Create web-based user interfaces with Python. The nice way.
roboflow/maestro - streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
D4Vinci/Scrapling - 🕷️ Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
wjbmattingly/streamlit_lessons_youtube -
wjbmattingly/spacy-annoy - A package for doing semantic search with spaCy docs.
wjbmattingly/bagpipes-spacy - Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.
hiyouga/LLaMA-Factory - Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
wjbmattingly/qwen2-vl-finetune-huggingface - This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.
explosion/spacy-layout - 📚 Process PDFs, Word documents and more with spaCy
congruence-engine/visualizing-oral-history - Repository for the 'Visualizing Oral History' investigation
MaartenGr/KeyBERT - Minimal keyword extraction with BERT
congruence-engine/entity-relationship-extraction - Repository for code and data related to the CE investigation into extracting structured information from unstructured text
JaidedAI/EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Calamari-OCR/calamari - Line based ATR Engine based on OCRopy
mittagessen/kraken - OCR engine for all the languages
QwenLM/Qwen-VL - The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
urchade/GLiNER - Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
huggingface/text-generation-inference - Large Language Model Text Generation Inference
vllm-project/vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
ALucek/QuicKB - Optimize Document Retrieval with Fine-Tuned KnowledgeBases
ALucek/NeedleInAVidStack - Extract, timestamp, and analyze specific content from video collections using LLM-powered audio/video processing.
lowerquality/gentle - gentle forced aligner
AV-EFI/efi-conv - Home for converter scripts developed as part of the AVefi project.
AV-EFI/av-efi-schema -
AV-EFI/sdk-adlib-exporter - Code used at Stiftung Deutsche Kinemathek to register AVefi compliant PIDs for collections on record in their local Adlib database
ictnlp/LLaVA-Mini - LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
CatchTheTornado/text-extract-api - Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSO
nationalarchives/dp-research-googletransferPOC - POC scripts to test potential transfer processes for material to TNA stored in Google Drive
intuitem/ciso-assistant-community - CISO Assistant is a one-stop-shop for GRC, covering Risk, AppSec and Audit Management and supporting +70 frameworks worldwide with auto-mapping: NIST CSF, ISO 27001, SOC2, CIS, PCI DSS, NIS2, CMMC, PS
unclecode/crawl4ai - 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
Lightricks/ComfyUI-LTXVideo - LTX-Video Support for ComfyUI
Lightricks/LTX-Video - Official repository for LTX-Video
uscdr-mediapres/sous-chef - An easy-to-use application that encodes DPX sequences into MKV video streams, primarily for archival storage
slhck/ffmpeg-normalize - Audio Normalization for Python/ffmpeg
DS4SD/docling - Get your documents ready for gen AI
katanaml/sparrow - Data processing with ML, LLM and Vision LLM
DAI-Lab/RivaGAN - Robust video watermarking with non-differentiable adversaries.
NatLibFi/Annif - Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
chartbeat-labs/textacy - NLP, before and after spaCy
boudinfl/pke - Python Keyphrase Extraction module
LIAAD/yake - Single-document unsupervised keyword extraction
Cinnamon/kotaemon - An open-source RAG-based tool for chatting with your documents.
ServerlessLLM/ServerlessLLM - Serverless LLM Serving for Everyone.
ucbepic/docetl - A system for agentic LLM-powered data processing and ETL
usefulsensors/moonshine - Fast and accurate automatic speech recognition (ASR) for edge devices
stanford-oval/storm - An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
gsu-library/whisper-scribe - An audio/video transcriber with diarization and transcription editing.
JosefAlbers/whisper-turbo-mlx - Blazing fast whisper turbo for ASR (speech-to-text) tasks
tenable/pyTenable - Python Library for interfacing into Tenable's platform APIs
Shubhamsaboo/awesome-llm-apps - Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
revdotcom/reverb - Open source inference code for Rev's model
microsoft/presidio - Context aware, pluggable and customizable data protection and de-identification SDK for text and images
EdyVision/pii-codex - A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)
LLaVA-VL/LLaVA-NeXT -
deekshaaneja/Qwen2-VL -
exo-explore/exo - Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
OpenBMB/MiniCPM-o - MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
aiola-lab/whisper-medusa - Whisper with Medusa heads
black-forest-labs/flux - Official inference repo for FLUX.1 models
ACMILabs/collection-chat - Uses LangChain and GPT-4 to chat with the ACMI Public API collection.
X-PLUG/mPLUG-Owl - mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
X-PLUG/mPLUG-DocOwl - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
akashmjn/tinydiarize - Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens
Avaiga/taipy - Turns Data and AI algorithms into production-ready web applications in no time.
freedmand/semantra - Multi-tool for semantic search
JSCU-NL/COATHANGER - IOCs and detection script for COATHANGER malware
Doriandarko/gemini-ui-to-code - A Streamlit application to generate code from images
microsoft/unilm - Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
huggingface/datatrove - Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
huggingface/optimum-nvidia -
THUDM/CogVLM2 - GPT4V-level open-source multi-modal model based on Llama3-8B
kadirnar/whisper-plus - WhisperPlus: Faster, Smarter, and More Capable 🚀
m-bain/whisperX - WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
JaesungHuh/SimpleDiarization - Simple Diarization model
ScrapeGraphAI/Scrapegraph-ai - Python scraper based on AI
HyperGAI/HPT - HPT - Open Multimodal LLMs from HyperGAI
ollama/ollama-python - Ollama Python library
Maximilian-Winter/llama-cpp-agent - The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured ou
stanfordnlp/dspy - DSPy: The framework for programming—not prompting—language models
OpenGVLab/Ask-Anything - [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
InternLM/xtuner - An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
vikhyat/moondream - tiny vision language model
BAAI-DCAI/Bunny - A family of lightweight multimodal models.
magic-research/PLLaVA - Official repository for the paper PLLaVA
mem0ai/mem0 - The Memory layer for AI Agents
artefactual-labs/amclient - Archivematica API client module
microsoft/LLMLingua - [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
thunlp/LLaVA-UHD - LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
Blaizzy/mlx-vlm - MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
meta-llama/llama3 - The official Meta Llama 3 GitHub site
armbues/SiLLM - SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
stanford-futuredata/ColBERT - ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
nateraw/audiocraft - Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable mu
arun-art06/trocr-large - Learn how to effortlessly convert handwritten text into editable digital text using the power of the Microsoft/Trocr-Large-Handwritten model from Hugging Face. With the help of Gradio, a user-friendly
VikParuchuri/marker - Convert PDF to markdown + JSON quickly with high accuracy
jina-ai/serve - ☁️ Build multimodal AI applications with cloud-native stack
apple/ml-mobileclip - This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
simonw/click-app - Cookiecutter template for creating new Click command-line tools
simonw/files-to-prompt - Concatenate a directory full of files into a single prompt for use with LLMs
theirstory/gliner-spacy - A spaCy wrapper for GliNER
yoheinakajima/mindgraph - proof of concept prototype for generating and querying against an ever-expanding knowledge graph with ai
yoheinakajima/instagraph - Converts text input or URL into knowledge graph and displays
instructor-ai/instructor - structured outputs for llms
mustafaaljadery/lightning-whisper-mlx - An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
huggingface/pytorch-image-models - The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)
lucidrains/vit-pytorch - Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
zhongyy/Face-Transformer - Face Transformer for Recognition
anguyen8/face-vit -
cohere-ai/BinaryVectorDB - Efficient vector database for hundred millions of embeddings.
hirmeos/entity-fishing-client-python - Repository hosting the common code for the entity-fishing clients
flairNLP/flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)
izuna385/Wikia-and-Wikipedia-EL-Dataset-Creator - You can create datasets from Wikia/Wikipedia that can be used for entity recognition and Entity Linking. Dumps for ja-wiki and VTuber-wiki are available!
ml-explore/mlx-examples - Examples in the MLX framework
mindee/doctr - docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
stitionai/devika - Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. D
facebookresearch/BELA - Bi-encoder entity linking architecture
explosion/weasel - 🦦 weasel: A small and easy workflow system
explosion/spacy-stanza - 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
explosion/spacy-curated-transformers - spaCy entry points for Curated Transformers
explosion/curated-transformers - 🤖 A PyTorch library of curated Transformer models and their composable components
explosion/spacy-huggingface-pipelines - 💥 Use Hugging Face text and token classification pipelines directly in spaCy
explosion/spacy-transformers - 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
mustafaaljadery/mlxserver - Start a server from the MLX library.
explosion/spacy-llm - 🦙 Integrating LLMs into structured NLP pipelines
IBM/zshot - Zero and Few shot named entity & relationships recognition
haotian-liu/LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
HumanSignal/label-studio-ml-backend - Configs and boilerplates for Label Studio's Machine Learning backend
diicellman/dspy-rag-fastapi - FastAPI wrapper around DSPy
SYSTRAN/faster-whisper - Faster Whisper transcription with CTranslate2
qnguyen3/chat-with-mlx - An all-in-one LLMs Chat UI for Apple Silicon Mac using MLX Framework.
charlax/professional-programming - A collection of learning resources for curious software engineers
keirf/greaseweazle - Tools for accessing a floppy drive at the raw flux level
agno-agi/agno - Agno is a lightweight framework for building multi-modal Agents
run-llama/llama_cloud_services - Knowledge Agents and Management in the Cloud
artefactual/automation-tools - Tools to aid automation of Archivematica and AtoM.
allenai/OLMo - Modeling, training, eval, and inference code for OLMo
argmaxinc/whisperkittools - Python tools for WhisperKit: Model conversion, optimization and evaluation
instillai/extract-audio-from-video-gpu - Extracting audio from video using GPU-accelerated FFMPEG
KarelDO/xmc.dspy - In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.
BlinkDL/RWKV-LM - RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN a
Peter-obi/Video_summarization_mlx - Transcribe and summarize videos using whisper and llms on apple mlx framework
AnswerDotAI/RAGatouille - Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
taketwo/llm-ollama - LLM plugin providing access to models running on an Ollama server
vegaluisjose/mlx-rag - Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.
video-db/StreamRAG - Video Search and Streaming Agent 🕵️‍♂️
letta-ai/letta - Letta (formerly MemGPT) is a framework for creating LLM services with memory.
bertramlyons/DPXdpxDPX - DPX header editing gizmo
da-z/mlx-ui - A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.
alphasecio/llama-index - A collection of apps powered by the LlamaIndex LLM framework.
maguowei/starred - creating your own Awesome List by GitHub stars!
bigscience-workshop/petals - 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
qurator-spk/dinglehopper - An OCR evaluation tool
rsommerfeld/trocr - Powerful handwritten text recognition. A simple-to-use, unofficial implementation of the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models".
facebookresearch/ImageBind - ImageBind One Embedding Space to Bind Them All
marimo-team/marimo - A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
h2oai/h2ogpt - Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
VikParuchuri/surya - OCR, layout analysis, reading order, table recognition in 90+ languages
lllyasviel/Fooocus - Focus on prompting and generating
abetlen/llama-cpp-python - Python bindings for llama.cpp
riccardomusmeci/mlx-llm - Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.
Vaibhavs10/on-device-llm-playground - A repo with scripts to test and play around with Facebook's recent llama models! 🤗
NVIDIA/NeMo - A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
impira/docquery - An easy way to extract information from documents
oobabooga/text-generation-webui - A Gradio web UI for Large Language Models with support for multiple inference backends.
AntonOsika/gpt-engineer - Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app
mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation
simonw/llm-mistral - LLM plugin providing access to Mistral models using the Mistral API
Vishnunkumar/craft_hw_ocr - Recognition of handwritten text using CRAFT text detection and TrOCR
fcakyon/craft-text-detector - Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector
simonw/llm - Access large language models from the command-line
Yuliang-Liu/MultimodalOCR - On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
SALT-NLP/LLaVAR - Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
mapluisch/LLaVA-CLI-with-multiple-images - LLaVA inference with multiple images at once for cross-image analysis.
LLaVA-VL/LLaVA-Interactive-Demo - LLaVA-Interactive-Demo
ise-uiuc/magicoder - [ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
stanford-oval/WikiChat - WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.
axolotl-ai-cloud/axolotl - Go ahead and axolotl questions
simonw/llm-llama-cpp - LLM plugin for running models using llama.cpp
ablwr/lc-sdf-data-exploration -
zylon-ai/private-gpt - Interact with your documents using the power of GPT, 100% privately, no data leaks
AudiovisualMetadataPlatform/whisper - Wrapper for the Whisper Text-to-speech tool
AudiovisualMetadataPlatform/amp_bootstrap - AMP system managment
run-llama/rags - Build ChatGPT over your data, all with natural language
microsoft/autogen - A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
ochen1/insanely-fast-whisper-cli - The fastest Whisper optimization for automatic speech recognition as a command-line interface ⚡️
m-bain/CondensedMovies-chall - Condensed Movies Challenge 2021
m-bain/frozen-in-time - Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
simonw/webvtt-to-json - Convert WebVTT to JSON, optionally removing duplicate lines
glut23/webvtt-py - Read, write, convert and segment WebVTT caption files in Python.
facebookresearch/nougat - Implementation of Nougat Neural Optical Understanding for Academic Documents
clovaai/donut - Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
jhodges10/fioctl -
Frameio/python-frameio-client - Python SDK for interacting with the Frame.io API. Documentation here - https://frameio.github.io/python-frameio-client/
pipinstallyp/minigpt4-batch - Use miniGPT-4 batch to generate captions for a lot of images! You should be able to create the best captions you always wanted!
theovercomer8/captionr - GIT/BLIP/CLIP Caption tool
simonw/blip-caption - Generate captions for images with Salesforce BLIP
OpenInterpreter/open-interpreter - A natural language interface for computers
adbar/trafilatura - Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
facebookresearch/GENRE - Autoregressive Entity Retrieval
facebookresearch/BLINK - Entity Linker solution
kuo77122/deep-face-detector -
yiminglin-ai/imdb-clean - A cleaned version of IMDB-WIKI dataset for facial age estimation.
divya21raj/Actor-Recognition-In-Movies - Recognizing actors in a movie clip or image, using OpenCV, DeepLearning and Python.
ageitgey/face_recognition - The world's simplest facial recognition api for Python and the command line
kermitt2/delft - a Deep Learning Framework for Text https://delft.readthedocs.io/
Lucaterre/spacyfishing - A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata
davidberenstein1957/spacy-dbpedia-spotlight - A spaCy wrapper for DBpedia Spotlight
davidberenstein1957/classy-classification - This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.
davidberenstein1957/concise-concepts - This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.
UB-Mannheim/spacyopentapioca - A spaCy wrapper of OpenTapioca for named entity linking on Wikidata
SapienzaNLP/extend - Entity Disambiguation as text extraction (ACL 2022)
egerber/spaCy-entity-linker - spaCy module for linking text to Wikidata items
openeventdata/es-geonames - Create a Geonames gazetteer index in Elasticsearch
openeventdata/mordecai - Full text geoparsing as a Python library
ina-foss/inaSpeechSegmenter - CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
lead-ratings/gender-guesser - Guess gender from first name in Python 2 and 3
Alialmanea/age-gender-detection-using-opencv-with-python - age & gender detection-using-opencv-with-python
torbjornbp/video-ocr2srt - A simple script to extract text elements from video files
openai/openai-python - The official Python library for the OpenAI API
unifiedstreaming/streaming-load-testing - Load generation tool for evaluation of DASH and HLS video streaming setups
alexwlchan/library-lookup - Finding books that are available in nearby branches of my public lending library
UAlbanyArchives/mailbagit - A tool for creating and managing Mailbags, a package for preserving email using multiple preservation formats
flavioribeiro/video-thumbnail-generator - 📷 Generate thumbnail sprites from videos.
bfidatadigipres/STORA - Off-air TV recording system. Open source Python3 and bash shell code
ozmartian/vidcutter - A modern yet simple multi-platform video cutter and joiner.
athento/hocr-parser - HOCR Specification Python Parser
jlieth/hocr-parser - Python parser for hOCR files using lxml
lucaswarwick02/HOCkeR - Python package for combining .hocr files and images into searchable PDFs
imdeepmind/hocrox - Hocrox: An image preprocessing and augmentation library with Keras like interface.
explosion/spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
clamsproject/apps - A repository to keep record of CLAMS apps
clamsproject/clams-python - CLAMS SDK for python
keighrim/concatrim - Python program to trim-and-join A/V media files using ffmpeg
clamsproject/app-barsdetection -
KenjiTakahashi/mpdecimate_trim - trim video clips based on mpdecimate output, keep audio synced
nielstenboom/recurring-content-detector - Unsupervised detection of opening / closing credits, recaps, and previews in video files 🎥🍿🎬
openai/whisper - Robust Speech Recognition via Large-Scale Weak Supervision
artefactual/archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
CarnegieHall/quality-control - Carnegie Hall Archives maintains a series of small, portable scripts to expedite batch processes for quality control on our Digital Collections.
FilmColors/VIAN -
simonw/s3-ocr - Tools for running OCR against files stored in S3
pyscript/pyscript - PyScript is an open source platform for Python in the browser. Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2
bfidatadigipres/transcoding - Open source automated transcoding scripts used at the BFI National Archive
iiif-prezi/iiif-prezi3 - IIIF Presentation API 3 Python Library
carevealed/md5tool - Python script to generate or check md5 checksums recursively for files in a directory tree.
bitmovin/bitmovin-api-sdk-python - Python API SDK which enables you to seamlessly integrate the Bitmovin API into your projects
toddbirchard/flasklogin-tutorial - 👨‍💻 🔑 Build Flask apps with user creation and log-in functionally.
pytube/pytube - A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.
sudowork/fix_m1_rgb - Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.
SpectraLogic/ds3_python_sdk -
alexwlchan/concurrently - A snippet for running multiple, concurrent invocations of a Python function
parallelencode/PyParallelEncode -
BlinkenOSA/workflows - Blinken OSA AV Preservation workflows implemented with Airflow (https://airflow.apache.org)
cs-afm/co-dot-py - A little cli tool for moving things around
SpectraLogic/ds3_python3_sdk -
NCSC-NL/log4shell - Operational information regarding the log4shell vulnerabilities in the Log4j logging library.
fullhunt/log4j-scan - A fully automated, accurate, and extensive scanner for finding log4j RCE CVE-2021-44228
giacomomarchioro/pyIIIFpres - Python module for easing the construction of JSON manifests compliant with IIIF API 3.0.
KBNLresearch/tapeimgr - Simple tape imaging and extraction tool
LibraryOfCongress/bagit-python - Work with BagIt packages from Python.
boto/boto3 - AWS SDK for Python
AdminTurnedDevOps/DevOps-The-Hard-Way-AWS - This repository contains free labs for setting up an entire workflow and DevOps environment from a real-world perspective in AWS
mbennett-uoe/whiiif - Simple IIIF Search service for OCRed texts
bfidatadigipres/dpx_encoding - BFI National Archive automated dpx preservation scripts written in BASH and Python for use with Media Area RAWcooked and other open source programmes.
bfidatadigipres/title_article_split - Python script to split multiple language articles from full title.
kfrn/ffmpeg-things - Scripts & notes about ffmpeg
IIIF/presentation-validator - Validator for the Presentation API
IIIF/prezi-2-to-3 - Libraries to upgrade Presentation API v2 to v3 automatically
tomcrane/bbctextav -
alexwlchan/lazyreader - Lazy reading of file objects for efficient batch processing
alexwlchan/clipatron - A script to automate video clipping using ffmpeg ✂️ 📼 ✂️
iiif-prezi/iiif-prezi - IIIF Presentation API implementation in Python
corkami/collisions - Hash collisions and exploitations
bfidatadigipres/checksum_scripts - Checksum speed test scripts using Python2 and Python3 MD5 and CRC32 algorithms.
andersbll/neural_artistic_style - Neural Artistic Style in Python
c0decracker/video-splitter - Simple Python script to split video into equal length chunks or chunks of equal size, duration, etc.
nlnzcollservices/harvester_manager - Mostly Automated Social-media Harvester
Digital-Preservation-Finland/fido - Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is designed for simple integration into automated work-flows.
TheScienceMuseum/elastic-wikidata - CLI for loading Wikidata subsets (or all of it) into Elasticsearch
liiight/notifiers - The easy way to send notifications
britishlibrary/mpt - A utility for staging files, calculating and validating file checksums, and comparing checksum values between storage locations.
bodleian/iiif_manifest_server - Bodleian IIIF Manifest Microservice
benjaminp/six - Python 2 and 3 compatibility library
ryanfb/HocrConverter - Create PDFs and plain text from hOCR documents
jbaiter/hocrviewer-mirador - View HOCR files with Mirador
ocropus/hocr-tools - Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
MeMAD-project/AudioTagger - Program for recognizing audio contents of sound files and videos.
MeMAD-project/rdf-converter - MeMAD metadata converter that transforms legacy metadata from INA and Yle into RDF using the MeMAD and EBU Core ontologies
pypa/pipenv - Python Development Workflow for Humans.
ali1234/vhs-teletext - Software to recover teletext data from VHS recordings.
MrS0m30n3/youtube-dl-gui - A cross platform front-end GUI of the popular youtube-dl written in wxPython.
ytdl-org/youtube-dl - Command-line program to download videos from YouTube.com and other video sites
Digital-Preservation-Finland/file-scraper - File detector, metadata collector and well-formedness checker tool
Digital-Preservation-Finland/ffmpeg-python - Python bindings for FFmpeg - with complex filtering support
Digital-Preservation-Finland/dpx-validator - DPX file format validator
tw4l/brunnhilde - Siegfried-based characterization tool for directories and disk images
Ymagis/ClairMeta - Clairmeta is a python package for Digital Cinema Package (DCP) probing and checking.
kieranjol/IFIscripts - Detailed documentation is available here: http://ifiscripts.readthedocs.io/en/latest/index.html

R

ahalterman/phoxy - R tools to download, ingest, and analyze the Phoenix dataset from the Open Event Data Alliance

Ruby

nypublicradio/transcript-editor - Web-based tool for correcting speech-to-text generated transcripts.
WGBH-MLA/transcript-editor - Web-based tool for correcting speech-to-text generated transcripts.
guerilla-di/depix - Read and write DPX file headers
avalonmediasystem/avalon - Avalon Media System – Samvera Application
athityakumar/colorls - A Ruby gem that beautifies the terminal's ls command, with color and font-awesome icons. 🎉

Rust

lumina-ai-inc/chunkr - Vision model based document ingestion
BurntSushi/ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore
bionic-gpt/bionic-gpt - BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality
huggingface/candle - Minimalist ML framework for Rust
awslabs/mountpoint-s3 - A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
acdha/mountstatus - MountStatusMonitor: paranoid monitor for POSIX filesystem mounts (Linux, OS X, FreeBSD)
ruffle-rs/ruffle - A Flash Player emulator written in Rust

SCSS

AI4LAM/awesome-ai4lam - A list of awesome AI in libraries, archives, and museum collections from around the world 🕶️

Scala

JohnSnowLabs/spark-nlp - State of the Art Natural Language Processing

Shell

Tygrain/bash-ffmpeg-completion -
healthyhost/audit-vps-script - Run a security scan on your server and identify common gaps. Get your VPS ready for production.
Shuffle/Shuffle - Shuffle: A general purpose security automation platform. Our focus is on collaboration and resource sharing.
mhasan49/package-manager - Installer script tailored for Debian/Ubuntu systems to installs necessary packages.
artefactual/archivematica-docs - Archivematica documentation
agarrharr/awesome-cli-apps - 🖥 📊 🕹 🛠 A curated list of command line apps
dericed/dpxderiver - shell script for converting DPX+wav input to specific outputs of DNxHD, lossless h264 at 4:2:2 YUV 10 bit, and a streamable h264
adi1090x/dynamic-wallpaper - A simple bash script to set wallpapers according to current time, using cron job scheduler.
kfrn/rainbow-video - A script that takes a video and creates a hue-ordered mosaic of frame captures.
bfidatadigipres/bfi-iiif-load-balancer - BFI's IIIF NGINX based load balancer application, to proxy user facing requests to backend applications.
danielgrant/server-scripts - A collection of scripts for server management, health checking and reporting
dericed/framemd5cmp - Present a comparison between framemd5 output of two video files.
eddycolloton/INPT - These shell scripts are intended to automate several steps frequently performed by media conservators at the Hirshhorn Museum and Sculpture Garden (HMSG).
ohmyzsh/ohmyzsh - 🙃 A delightful community-driven (with 2,400+ contributors) framework for managing your zsh configuration. Includes 300+ optional plugins (rails, git, macOS, hub, docker, homebrew, node, php, python,
antespi/s3md5 - Bash script to calculate Etag/S3 MD5 sum for very big files uploaded using multipart S3 API

Swift

freedmand/textra - A command-line application to convert images, PDFs, and audio files to text using Apple's APIs
argmaxinc/WhisperKit - On-device Speech Recognition for Apple Silicon
gluonfield/enchanted - Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.
preternatural-explore/mlx-swift-chat - A multi-platform SwiftUI frontend for running local LLMs with Apple's MLX framework.
deployradiant/pajama - A UI for Ollama on Mac
exelban/stats - macOS system monitor in your menu bar
vincentneo/LosslessSwitcher - Automated Apple Music Lossless Sample Rate Switching for Audio Devices on Macs.

Tcl

cs-afm/Check-Sammy - Python GUI for calculating and monitoring md5 checksums

TypeScript

dzhng/deep-research - An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simp
piotrkulpinski/openalternative - A community driven list of open source alternatives to proprietary software and applications.
supermemoryai/supermemory - Build your own second brain with supermemory. It's a ChatGPT for your bookmarks. Import tweets or save websites and content using the chrome extension.
AykutSarac/jsoncrack.com - ✨ Innovative and open-source visualization application that transforms various data formats, such as JSON, YAML, XML, CSV and more, into interactive graphs.
RooVetGit/Roo-Code - Roo Code (prev. Roo Cline) is a VS Code plugin that enhances coding with AI-powered automation, multi-model support, and experimental features
ggml-org/llama.vscode - VS Code extension for LLM-assisted code/text completion
upscayl/upscayl - 🆙 Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.
hoarder-app/hoarder - A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
siyuan-note/siyuan - A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
ahmedkhaleel2004/gitdiagram - Replace 'hub' with 'diagram' in any GitHub url to instantly visualize the codebase as an interactive diagram
gristlabs/grist-core - Grist is the evolution of spreadsheets.
nocodb/nocodb - 🔥 🔥 🔥 Open Source Airtable Alternative
twentyhq/twenty - Building a modern alternative to Salesforce, powered by the community.
immich-app/immich - High performance self-hosted photo and video management solution.
yamadashy/repomix - 📦 Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or o
microsoft/data-formulator - 🪄 Create rich visualizations with AI
n8n-io/n8n - Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
getomni-ai/zerox - PDF to Markdown with vision models
Sh4yy/personal-ai -
ax-llm/ax - The unofficial DSPy framework. Build LLM powered Agents and "Agentic workflows" based on the Stanford DSP paper.
enricoros/big-AGI - AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlight
run-llama/create-llama - The easiest way to get started with LlamaIndex
n4ze3m/page-assist - Use your locally running AI models to assist you in your web browsing
ai-ng/2txt - Image to text, fast.
jina-ai/reader - Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
huggingface/llm-vscode - LLM powered development for VSCode
hrishioa/lumentis - AI powered one-click comprehensive docs from transcripts and text.
da-z/llamazing - A simple Web / UI / App / Frontend to Ollama.
run-llama/sec-insights - A real world full-stack application using LlamaIndex
wsxiaoys/bobtail.dev - Poor man's phind.com/perplexity.ai
leptonai/search_with_lepton - Building a quick conversation-based search demo with Lepton AI.
Nutlope/pdftochat - Chat with your PDFs with AI
osmoscraft/osmosmemo - Turn GitHub into a bookmark manager
janhq/jan - Jan is an open source alternative to ChatGPT that runs 100% offline on your computer
hrishioa/wishful-search - Natural language search for complex JSON arrays, with AI Quickstart.
fromsmash/smash-downloader-js - Official JavaScript library to download transfers using the Smash API & SDK 🚀
karolkozer/planby -
elastic/kibana - Your window into the Elastic Stack
mifi/ezshare - Easily share files, folders and clipboard over LAN - Like Google Drive but without internet
mifi/lossless-cut - The swiss army knife of lossless video/audio editing
directus/directus - The flexible backend for all your projects 🐰 Turn your DB into a headless CMS, admin panels, or apps with a custom UI, instant APIs, auth & more.
digirati-co-uk/iiif-manifest-editor - Create new IIIF Manifests. Modify existing manifests. Tell stories with IIIF.
IIIF-Commons/parser - IIIF Presentation 2 + 3 parser
muxinc/media-chrome - Custom elements (web components) for making audio and video player controls that look great in your website or app.
gTile/gTile - A window tiling extension for Gnome.
mifi/editly - Slick, declarative command line video editing & API
archival-IIIF/test-server -
archival-IIIF/demo -
SocialGouv/archifiltre-docs - Visualisez et améliorez vos arborescences de fichiers !
archival-IIIF/viewer - IIIF compatible viewer for digital born file storages
UniversalViewer/universalviewer - A community-developed open source project on a mission to help you share your 📚📜📰📽️📻🗿 with the 🌎
freeCodeCamp/freeCodeCamp - freeCodeCamp.org's open-source codebase and curriculum. Learn to code for free.

Vue

0xJacky/nginx-ui - Yet another WebUI for Nginx
beeldengeluid/open-images-browser - MediaScape project researching the utility of Generous Interfaces for audiovisual archives

XSLT

preservica/automated-preservation-recommendations - This repository contains a Wiki of information related to recommendation preservation actions in support of Preservica Automated-Preservation functionality, as well as some basic tools for working wit
dericed/xsl4metadata - various xsl to do this or that

Zig

lightpanda-io/browser - Lightpanda: the headless browser designed for AI and automation

License

To the extent possible under law, stephenmcconnachie has waived all copyright and related or neighboring rights to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 746 Commits
.github/workflows		.github/workflows
LICENSE		LICENSE
README.md		README.md
topics.md		topics.md

License

stephenmcconnachie/starred

Folders and files

Latest commit

History

Repository files navigation

Awesome Stars

Contents

C

C#

C++

CSS

Cython

Dockerfile

Go

HCL

HTML

Haskell

Java

JavaScript

Jupyter Notebook

Kotlin

Lua

Others

PHP

Pascal

Perl

Python

R

Ruby

Rust

SCSS

Scala

Shell

Swift

Tcl

TypeScript

Vue

XSLT

Zig

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages