Downloading and Staging Files with get¶
The get command in Biotope provides a convenient way to download files from a URL and immediately stage them for metadata creation and version control. It integrates seamlessly with Biotope's git-on-top workflow. See also the annotation tutorial for more information on annotating your data.
Basic Usage¶
The simplest way to use the get command is to provide a URL:
biotope get https://raw.githubusercontent.com/biocypher/biotope/refs/heads/main/tests/example_gene_expression.csv
or
biotope get https://raw.githubusercontent.com/biocypher/biotope/refs/heads/main/tests/example_protein_sequences.fasta
This will:
1. Download the file to the data/raw directory (or a custom location)
2. Add the file to your biotope project and stage it for metadata creation (using the same mechanism as biotope add)
3. Show you the next steps: annotate and commit
Note: The annotation process is now a separate, explicit step. After downloading, you should run biotope annotate --staged to create or complete the metadata, and then commit your changes.
Important: The downloaded data file is excluded from Git tracking via .gitignore. Only the metadata is version controlled, keeping repositories small and focused.
Command Options¶
The get command supports the following options:
Available Options¶
--output-dir,-o: Specify a custom directory for downloaded files (default:data/raw)--no-add: Download the file without adding it to the biotope project (advanced use)
Download Locations¶
Default Location¶
By default, files are downloaded to a data/raw directory in your current working directory:
your-project/
├── data/
│ ├── raw/ # Default download location
│ │ ├── file1.csv
│ │ └── file2.fasta
│ └── processed/
├── .biotope/
└── .git/
This aligns with the recommended project structure and makes it easy to organize your data files.
Custom Location¶
You can specify a custom download location using the --output-dir option:
# Download to a specific directory
biotope get https://example.com/data/file.csv --output-dir ./data/processed
# Download to an absolute path
biotope get https://example.com/data/file.csv --output-dir /Users/username/project/data
Recommended Organization¶
For better project organization, consider downloading files to appropriate subdirectories:
# Download to raw data directory (default)
biotope get https://example.com/data/experiment.csv
# Download to processed data directory
biotope get https://example.com/data/results.csv --output-dir ./data/processed
# Download to specific experiment directory
biotope get https://example.com/data/experiment.csv --output-dir ./data/raw/experiment_2024_01
File Tracking and Moves¶
Biotope tracks files by their relative path from the project root. This means:
How File Tracking Works¶
- Files are tracked using their relative path (e.g.,
data/raw/experiment.csv) - The metadata stores this relative path in the
contentUrlfield - Biotope can find files regardless of where you run commands from within the project
Moving Files After Download¶
If you download a file and later want to reorganize your project structure:
-
Move the file manually:
-
Update the metadata:
-
Commit the changes:
Checking File Integrity¶
Use biotope check-data to verify that all tracked files are still accessible:
# Check all files
biotope check-data
# Check specific file
biotope check-data -f data/raw/experiment.csv
This will report: - Valid: File exists and checksum matches - Missing: File not found at recorded location - Corrupted: File exists but checksum doesn't match - Untracked: File not tracked in biotope
Automatic Metadata Generation and Staging¶
When downloading a file, the get command automatically generates initial metadata in Croissant ML format and stages it in git. This includes:
- File identification (name, path, SHA256 hash)
- File type detection
- Source URL
- Creator information
- Creation date
The generated metadata follows the schema.org and Croissant ML standards, making it compatible with the rest of the Biotope ecosystem. Metadata is created in .biotope/datasets/ and staged for commit.
Example Generated Metadata¶
{
"name": "Dataset_file.txt",
"description": "Dataset containing file downloaded from https://example.com/data/file.txt",
"url": "https://example.com/data/file.txt",
"creator": {
"@type": "Person",
"name": "username"
},
"dateCreated": "2024-03-21",
"distribution": [
{
"@type": "sc:FileObject",
"@id": "file_sha256hash",
"name": "file.txt",
"contentUrl": "data/raw/file.txt",
"encodingFormat": "text/plain",
"sha256": "sha256hash"
}
]
}
Next Steps: Annotate and Commit¶
After downloading and staging the file, continue with the standard git-on-top workflow:
-
Check status:
This shows the staged file and its metadata status. -
Annotate the file:
This opens an interactive session to complete the metadata. -
Commit your changes:
Examples¶
Download and Stage a CSV File¶
biotope get https://example.com/data/expression.csv
biotope status
biotope annotate interactive --staged
biotope commit -m "Add expression dataset"
Download to a Specific Directory¶
Download Without Adding to Project¶
biotope get https://example.com/data/expression.csv --no-add
# Later, manually add the file
biotope add data/raw/expression.csv
Integration with Other Commands¶
The get command integrates with the full git-on-top workflow:
- Use
biotope statusto see staged files and their annotation status - Use
biotope annotate interactive --stagedto annotate all newly downloaded files - Use
biotope committo save your changes - Use
biotope check-datato verify data integrity
Troubleshooting¶
Common Issues¶
- Download Fails
- Check your internet connection
- Verify the URL is accessible
-
Ensure you have write permissions in the output directory
-
Metadata Not Created
- Make sure you are in a biotope project and a git repository
-
Check for error messages in the output
-
Annotation Fails
- Check if the file is corrupted
- Verify you have sufficient disk space
-
Ensure you have the required permissions
-
Metadata Issues
- Use
biotope annotate validateto check metadata validity - Review the pre-filled metadata carefully
-
Make sure all required fields are filled
-
File Not Found After Move
- Use
biotope check-datato identify missing files - Re-add files in their new locations with
biotope add --force - Commit the changes to update metadata
Getting Help¶
For additional help, use:
This will show all available options and usage examples.