Biotope¶

CLI integration for BioCypher ecosystem packages

Biotope is still under development

Biotope is still under development and the API is subject to change. The package is currently only meant for developer use and prototyping.

The Biotope CLI integration is our attempt to integrate our open source ecosystem packages into an accessible suite for scientific knowledge management. We are first approaching the project from a CLI perspective, as this is the most basic technology for prototyping automated workflows. We aim to extend this towards other user-interfaces, such as web apps, in the future.

Biotope contains various modules for different tasks, some of which are straightfoward applications of existing ecosystem packages, while others are prototypes for new features. See more information in the API documentation.

biotope init: Initialize a new project
biotope status/add/commit/log/push/pull: Manage metadata in a git-like fashion
biotope get: Download files from a URL and stage them for annotation and version control
biotope annotate: Annotate your data with consistent metadata in Croissant ML
biotope check-data: Perform consistency checks for file integrity
biotope mv: Move tracked files and update metadata automatically
biotope config: Manage project configuration and metadata settings
biotope build: Build a BioCypher knowledge representation
biotope chat: Chat with a project using BioChatter
biotope agent: Plan agentic workflows using BioChatter
biotope read: Extract information from unstructured modalities (BioGather)
biotope view: Use visual analysis tools to interpret your data and metadata

Git Integration for Metadata Version Control¶

Biotope uses a Git-on-Top strategy for metadata version control, providing:

Version control for all metadata changes using Git
Collaboration through standard Git workflows
Data integrity through checksum verification
Familiar tooling - all Git tools work seamlessly

Core Git-Integrated Commands¶

biotope add: Stage data files for metadata creation
biotope mv: Move tracked data files and update metadata
biotope get: Download remote files and stage them for metadata creation
biotope status: Show current project status
biotope commit: Commit metadata changes using Git
biotope log: View commit history
biotope push/pull: Share metadata with remote repositories
biotope check-data: Verify data integrity against checksums

Basic Workflow¶

# Initialize project (with Git, .gitignore, and optional project metadata)
biotope init

# Add local data files (creates metadata, ignored by Git by default)
biotope add data/raw/experiment.csv

# Or add new files at once, recursively
biotope add -r data

# Or download and stage remote files (calls `add` once finished)
biotope get https://example.com/data/experiment.csv

# Check status (shows metadata changes, not data files)
biotope status

# Create metadata for staged files (with project metadata pre-fill)
biotope annotate interactive --staged

# Or complete incomplete annotations
biotope annotate interactive --incomplete

# Commit changes (metadata only, data files excluded via .gitignore)
biotope commit -m "Add experiment dataset"

# View history
biotope log --oneline

Note: Data files are automatically excluded from Git tracking via .gitignore. Only metadata is version controlled, keeping repositories small and focused.

Project-Level Metadata¶

Biotope supports project-level metadata collection during initialization that can be used to pre-fill annotation fields:

Description: Project description and purpose
URL: Project homepage or repository
Creator: Project maintainer information
License: Data usage license
Citation: How to cite the project

This metadata is stored in .biotope/config/biotope.yaml and automatically pre-fills fields when using biotope annotate interactive.

Documentation¶

Git Integration for Users: Learn how to use biotope's Git integration, leveraging your existing Git knowledge
Git Integration for Developers: Understand the technical implementation and architecture
Cluster Compliance: How to enforce and check metadata validation policies across clusters

Metadata annotation using Croissant, short guide¶

The biotope package features a metadata annotation assistant using the recently introduced Croissant schema. It is available as the biotope annotate module. Usage:

pip install biotope
biotope annotate interactive

You can also use the biotope get command to download files and stage them for annotation and version control:

biotope get https://example.com/data/file.txt
biotope status
biotope annotate interactive --staged
biotope commit -m "Add new dataset from URL"

This will download the file, stage it for annotation, and fit into the same workflow as local files.

Project Metadata Pre-fill: If you've set up project-level metadata during biotope init, the annotation form will be pre-filled with this information, making the annotation process faster and more consistent.

After creation, biotope can also be used to validate the JSON-LD (CAVE: being a prototype, biotope does not yet implement all croissant fields):

biotope annotate validate –jsonld <file_name.json>

biotope also has the method biotope annotate create to create metadata files from CLI parameters (no interactive mode) and biotope annotate load to load an existing record (the use of this is not well-defined yet). Further improvements would be the integration of LLMs for the automation of metadata annotations from file contents (using the biochatter module of biotope).

Unit tests to inform about further functions and details can be found at https://github.com/biocypher/biotope/blob/main/tests/commands/test_annotate.py and https://github.com/biocypher/biotope/blob/main/tests/commands/test_get.py

Copyright¶

Free software distributed under the MIT License.