Adding Files to Your Biotope Project¶

The biotope add command is your first step in managing data files with biotope. It prepares your files for metadata creation and version control. This tutorial will show you how to use it effectively.

Prerequisites¶

Before you start, make sure you have:

A biotope project initialized: Run biotope init if you haven't already
Git repository: Your project should be a Git repository (biotope init can set this up)
Data files: Some files you want to track

Basic Usage¶

Adding a Single File¶

The simplest way to add a file is to provide its path:

biotope add data/raw/experiment.csv

This will: - Calculate a SHA256 checksum for data integrity - Create a basic metadata file in .biotope/datasets/ - Stage the metadata changes in Git - Show you what happened

Adding Remote Files¶

If you want to add files from a URL, use biotope get instead:

biotope get https://example.com/data/experiment.csv

This downloads the file and stages it for metadata creation, just like biotope add. The workflow after downloading is the same: check status, annotate, and commit.

Adding Multiple Files¶

You can add several files at once:

biotope add data/raw/experiment1.csv data/raw/experiment2.csv data/raw/experiment3.csv

Adding Entire Directories¶

To add all files in a directory, use the --recursive flag:

biotope add data/raw/ --recursive

This will add all files in data/raw/ and any subdirectories.

Understanding the Output¶

When you run biotope add, you'll see output like this:

📁 Added data/raw/experiment.csv (SHA256: e471e5fc...)

✅ Added 1 file(s) to biotope project:
  + data/raw/experiment.csv

💡 Next steps:
  1. Run 'biotope status' to see staged files
  2. Run 'biotope annotate interactive --staged' to create metadata
  3. Run 'biotope commit -m "message"' to save changes

💡 For incomplete annotations:
  1. Run 'biotope status' to see which files need annotation
  2. Run 'biotope annotate interactive --incomplete' to complete them

This tells you: - Which files were successfully added - Their checksums for data integrity - What to do next in your workflow

Important: The data file itself is not added to Git - only the metadata is tracked. The data/ directory is excluded from Git via .gitignore to keep repositories small and focused on metadata.

Working with Different Path Types¶

Relative Paths (Recommended)¶

Relative paths are preferred for better portability:

# From your project root
biotope add data/raw/experiment.csv
biotope add ./data/raw/experiment.csv

# From a subdirectory
cd data/raw/
biotope add experiment.csv

Absolute Paths¶

You can also use absolute paths:

biotope add /Users/username/project/data/raw/experiment.csv

Paths with Spaces¶

For files with spaces in their names, use quotes:

biotope add "data/raw/my experiment data.csv"

Handling Common Scenarios¶

Adding Already Tracked Files¶

If you try to add a file that's already tracked, you'll see:

⚠️  File 'data/raw/experiment.csv' already tracked (use --force to override)

To force add it anyway (useful if the file has changed):

biotope add data/raw/experiment.csv --force

Adding Directories Without Recursive Flag¶

If you try to add a directory without --recursive:

⚠️  Skipping directory 'data/raw/' (use --recursive to add contents)

To add the directory contents:

biotope add data/raw/ --recursive

Mixed Results¶

When adding multiple files, some might succeed and others might fail:

📁 Added data/raw/experiment1.csv (SHA256: abc123...)
⚠️  File 'data/raw/experiment2.csv' already tracked (use --force to override)

✅ Added 1 file(s) to biotope project:
  + data/raw/experiment1.csv

⚠️  Skipped 1 file(s):
  - data/raw/experiment2.csv

Organizing Your Data¶

Recommended Directory Structure¶

your-project/
├── data/
│   ├── raw/           # Original data files
│   │   ├── experiment1/
│   │   └── experiment2/
│   └── processed/     # Processed data files
├── .biotope/          # Metadata (auto-created)
└── .git/              # Git repository

Adding Different Data Types¶

# Add raw data
biotope add data/raw/ --recursive

# Add processed data
biotope add data/processed/ --recursive

# Add specific file types
biotope add data/raw/*.csv
biotope add data/raw/*.fasta

Integration with Other Commands¶

Check What Was Added¶

After adding files, check their status:

biotope status

This shows you what metadata files are staged for commit.

Create Detailed Metadata¶

The basic metadata created by add is minimal. Enhance it:

biotope annotate interactive --staged

This opens an interactive session to add detailed metadata.

Commit Your Changes¶

Once you're satisfied with the metadata:

biotope commit -m "Add experiment dataset with 24 samples"

Verify Data Integrity¶

Later, you can verify your files haven't been corrupted:

biotope check-data

Git and Data Files¶

Understanding the Separation¶

Biotope separates data files from Git tracking:

Data files: Stored in data/ directory, excluded from Git via .gitignore
Metadata: Stored in .biotope/datasets/, tracked by Git
Checksums: Embedded in metadata to ensure data integrity

Benefits of This Approach¶

# Clean Git status (no data files cluttering output)
git status

# Only metadata changes appear in history
git log --oneline

# Small repository size (no large data files)
du -sh .git

# Easy collaboration (share metadata, not data)
git push origin main

Working with Data Files¶

Even though data files aren't in Git, biotope still tracks them:

# Add a file (creates metadata, doesn't add to Git)
biotope add data/raw/experiment.csv

# Check what's tracked (metadata only)
biotope status

# Verify data integrity
biotope check-data

# See all tracked metadata files
git ls-files .biotope/

Best Practices¶

1. Use Relative Paths¶

Relative paths make your project more portable:

# Good
biotope add data/raw/experiment.csv

# Avoid
biotope add /absolute/path/to/experiment.csv

2. Organize Your Data¶

Keep your data organized in logical directories:

data/
├── raw/
│   ├── experiment_2024_01/
│   └── experiment_2024_02/
└── processed/
    └── combined_results/

3. Add Files Incrementally¶

Add files as you work with them rather than all at once:

# Add files as you create them
biotope add data/raw/new_experiment.csv
biotope annotate interactive --staged
biotope commit -m "Add new experiment data"

4. Use Descriptive Commit Messages¶

When you commit after adding files:

# Good
biotope commit -m "Add RNA-seq dataset: 24 samples, 3 conditions"

# Better
biotope commit -m "Add RNA-seq dataset: 24 samples, 3 conditions, QC passed, ready for analysis"

Troubleshooting¶

"Not in a biotope project"¶

❌ Not in a biotope project. Run 'biotope init' first.

Solution: Run biotope init to initialize a biotope project.

"Not in a Git repository"¶

❌ Not in a Git repository. Initialize Git first with 'git init'.

Solution: Initialize Git in your project directory:

git init
git config user.name "Your Name"
git config user.email "your.email@example.com"

"File already tracked"¶

⚠️  File 'data/raw/experiment.csv' already tracked (use --force to override)

Solution: Use --force if you want to update the file's metadata:

biotope add data/raw/experiment.csv --force

"Path does not exist"¶

❌ Path 'data/raw/experiment.csv' does not exist.

Solution: Check the file path and make sure the file exists.

Downloading Files: Learn how to download and stage files from URLs
Annotating Data: Learn how to create detailed metadata for your data
Project Status: Learn how to check your project status and manage metadata

Getting Help¶

For additional help, use:

biotope add --help

This will show all available options and usage examples.