Updates

timestamp	comments
2023-09-14	the first draft was created

why, the benifits
how, the literature review
what, the introduction to open-creator
when, the roadmap and checklist | | 2023-09-17 | add @Pablo Vazquez comments see: https://discord.com/channels/1146610656779440188/1149056926773166240/1152942317745999883

update skill extractor agent’s prompt and schema also test on open-interpreter, see: https://github.com/KillianLucas/open-interpreter/pull/399

notice: the implementation may be different with this blog the ideas of this will be iteratively updated | | 2023-09-18 | to have a bigger picture,

supplement for skill object/interface:

a skill can be created through various ways, such as user request, messages (conversation history), json file, json object, a code file in a repository, a API doc and even database
a skill can be transformed into various object/format, such as a code file, json, huggingface tool object, open ai function call format, langchain chain object and etc
a skill can be anything durable / persistent-able and solution-oriented: an agent, a bunch of agents, a tool, an api call, a http/https request, a natural language tips / procedues or a template
skills should be able to be tested, evaluated, refacted, combined, iterative, and derivative
skills can be structured into courses and stacked using a curriculum learning approach

about cli:

support various interfaces and intergretions
python import
bash commands
http/https server
as an agent server supports natural language to run api
All the code in open-creator can be instantiated as skill objects, so you can even create open-creator itself, essentially enabling it to bootstrap and recreate itself.

about test envrioment

conda env ?
docker container? | | 2023-09-20 | Development Progress Update for Open-Creator here's a brief update on the development progress of open-creator: Completed:
Skill schema
Messages skill extractor agent
Creator creation from messages (conversation history), message json path, and skill_path
Creator save
Streamlining output for function call parameters
Dependency checks Pending:
Support for code file creation, expected by 9/19
Test writer agent and interpreter agent enhancements, expected by 9/20 (Wednesday)
Support for request creation, expected by 9/20
API doc creation support, projected for 9/20 (Wednesday)
Conversion to HuggingFace tool, upload to HuggingFace & download from HuggingFace, expected by 9/21
Conversion to Langchain tool, upload to Langchain Hub & download from Langchain Hub, projected for 9/21
Semantic search support, projected for 9/21 (Thursday)
Comprehensive documentation, testing, and technical report writing from Friday to Saturday
Pre-release scheduled for the weekend Discussing Extensibility:
✅ Custom agent compatibility, custom input, and skill creation
✅ Schema information extensibility via metadata
❓ Compatibility with different database caching, community contributions encouraged
❓ Compatibility with various local large models, community contributions welcomed
❓ Enhanced configuration, open for community contributions
❓ Different semantic search strategies, community input invited
❓ Secure local environment for code execution, seeking community input
(Under discussion) Multi-skill stacking mechanism One approach to achieve better extensibility is to streamline open-creator, retaining only the core create and save logic. Functions in utils would become skill objects, and interpreter functions would also morph into skill objects. Even indexing and search in vector databases would be made into skill objects. This enables plug-and-play extensibility through user-defined skills. Also:
Unified configuration settings, support for .env, ~/.bashrc export, and creator.config.xxx = Regarding environmental considerations, I had another idea:

Running in Docker
Creating a separate conda environment for each skill (since conda caches can be reused) This ensures security and resolves dependency conflicts. However, Docker isolates local directories, and users need to install conda. This can make it less lightweight. If a virtual environment is solely needed, venv can also be an option. We can let users choose between conda and venv later on, but Docker might not be feasible. The earlier Docker solution discussion was about having a virtual partition, like with GPT-4, where files reside. This topic can be open for discussion and added to the roadmap for developer contributions. A dedicated channel on Discord would be perfect. Regarding Langchain's dependency, the reason I wanted to eliminate it is that some developers dislike using Langchain. We previously felt that Langchain was too heavily wrapped. It's great for direct project use, but as a library, it might deter some users. Why not implement the core functionalities of Langchain ourselves? But then the workload would increase considerably. This might make our code less clear, and we'd end up wrapping it like Langchain. Such legacy code is hard for people to study, leading to pull requests and issues. I believe the first version should be as lightweight as possible. HuggingFace Standards:

Unified naming for HuggingFace: user/open-creator-skill-library
Within this directory, files should be named based on the skill
Two methods for pull:

Use list_files_info and then choose skill
Use hf_hub_download specifying the skill name
Ensure there's a skill.json file within user/open-creator-skill-library/<skill_name> Other Thoughts and Reminders:
Support for HuggingFace, Git, and Langchain hubs
Provide three usage examples in the documentation
Multiple ways to create skills, with support for community hub imports and uploads
Persistent skills in an interpreter context, with support for saving, importing, and searching
More generalized use-cases, converting classic Langchain agents, HuggingFace models, and frequent tasks (like file handling, data analysis, plotting, web scraping) into skills. Viewing open-creator itself as a skill.
Best practices for executing code skills: automatic dependency checks and installations during skill installation, input type verification using Pydantic, and AST package for execution.
Clear prompts about standards: doc strings, production-grade code, readability, low complexity, and adhering to the best practices of the current language. The code should resemble the formality of the Transformers library.
Add an additional args dictionary in metadata for new fields and annotations.
For remote libraries, auto-generate READMEs and usage instructions. Allow users to copy open-creator code with a single click.

Technology Report Name: `open creator: filling the gap between code interpreter and skill library` | | 2023-09-21 | Current Progress

We are setting up a repository on Space named skill-library. Inside it, we will store the following:

A Gradio app to display the skill-library readme, filtering options (tag, repo_id, skill_name, version), as well as search functionality.
Specific skill directories will look as follows: bash (base) ➜ skill_library pwd /Users/gongjunmin/.cache/open_creator/skill_library (base) ➜ skill_library tree extract_pdf_section extract_pdf_section ├── conversation_history.json ├── embedding_text.txt ├── function_call.json ├── install_dependencies.sh ├── skill.json ├── skill_code.py └── skill_doc.md 1 directory, 7 files When a user filters or selects a particular skill-library, a script will automatically generate a readme which includes the skill's name, description, usage instructions, and how to import it through the creator.create interface. python import creator skill = creator.create(huggingface_repo_id="Timedomain-tech/skill-library", huggingface_skill_path="ChuxiJ/skill-library/get_whether", version="1.0.0") User Upload Process for their Skill Library
The following script is used for saving:
If no repo exists, it will automatically clone our template Timedomain-tech/open-creator-skill-library and create one.
Finally, the skill's JSON files and related content are uploaded. python import creator creator.save(skill, huggingface_repo_id="ChuxiJ/open-creator-skill-library") Mechanism for User Skill Submissions
After upload, a pull request will automatically be created in our skill-library git repository, aiming to edit the skill-library.md file.
We will have an automated bot script that will evaluate the submitted skill and conduct an automatic review.
If it passes, it will be automatically merged into the repository.
For the skill-library on Hugging Face Space, updates will be pulled from the skill-library.md file regularly, for instance, daily.

-- Normally, we want to create a skill object after an open-interpreter session (see the pull request 399) . Thanks to the skill_extractor_agent, we can use our conversation history and convert it into a formatted skill object by using %save_skill command. But now, we can create a skill object by using only a request.

How this happened? We fork the code of open-interpreter , refactor and re-implement it into a minimal version Code interpreter works in only 45 lines has been integrated into open-creatorpython import subprocess import traceback def get_persistent_process(start_cmd: str): process = subprocess.Popen( args=start_cmd.split(), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, bufsize=0 ) return process class BaseInterpreter: """A tool for running base code in a terminal.""" name: str = "base_interpreter" description: str = ( "A base shell command tool. Use this to execute bash commands. " "It can also be used to execute any language with interactive mode" ) def __init__(self): self.process = None def run(self, query: str, is_start: bool = False) -> dict: if is_start or self.process is None: try: self.process = get_persistent_process(query) return {"status": "success", "stdout": "", "stderr": ""} except Exception: traceback_string = traceback.format_exc() return {"status": "error", "stdout": "", "stderr": traceback_string} stdout, stderr = "", "" try: stdout, stderr = self.process.communicate(input=query) except BrokenPipeError: stderr = traceback.format_exc() return {"status": "success", "stdout": stdout, "stderr": stderr} bash (open_creator_dev) ➜ code_interpreter git:(main) ✗ tree . . ├── R.py ├── __init__.py ├── applescript.py ├── base.py ├── html.py ├── javascript.py ├── julia.py ├── python.py └── shell.py

Like chatDev, GPT-Enginer, GPT-team and etc, we can use autonomous agent. There will be agents not only help you write code, converting codes into skills, but also doing the tests, and the refactor stuff... -— | | | |

Untitled

Why the Need?

Scalability

GPT-4 naturally leans towards providing detailed steps and explanations. However, in situations with limited context, addressing intricate scenarios requires abstract skill libraries. Users must laboriously assemble these domain-specific operations. A pitfall here is that these meticulously organized skill sets aren't easily reusable. This makes tasks like constructing intricate, layered projects a significant challenge.

Cost-Effective

Often, achieving the right script or code necessitates multiple iterations. These very scripts or codes are then frequently invoked in our daily routines, say, checking upcoming birthdays. What we need is a skill_librarymechanism that allows us to consolidate and archive the refined versions of these codes, turning them into readily usable skill sets. So, the next time around, we can just call upon this package instead of iterating all over again to derive the correct script. This not only saves us time but also conserves tokens.

Community Wisdom Untapped

A significant shortcoming of lacking a cohesive "skill_library" is the missed opportunity to harness the collective intelligence of the global community. All over the world, ingenious developers and users continuously discover optimized solutions to challenges. Without a centralized platform to archive and share these insights, the wheel gets reinvented repeatedly. A skill library could serve as a repository where community members contribute, refine, and validate diverse solutions, amplifying shared knowledge's potential.

Consistency and Enhanced Robustness

A dedicated "skill_library" ensures users experience consistency when tackling challenges. Accessing well-curated and polished knowledge not only offers reliable solutions but also promises uniform outcomes. This uniformity becomes vital when reflecting on the frustrations tied to replicating another person's successful process. Erratic or unpredictable experiences can be exasperating. A standardized skill library offers robust solutions, eliminating the inconsistencies often associated with problem-solving.

How to Implement It?

In the vast seascape of problem-solving mechanisms, we have an array of potential solutions. Some are already in play, while others remain as budding concepts, ripe for exploration.