Configuration Reference
Support Aurora: GitHub Sponsors · Patreon · Buy Me a Coffee
The SolarHub system is configured through a centralized YAML file located at configs/system_config.yaml. This file defines everything from our task types to our machine learning platforms.
1. System Metadata
system:
name: SolarHub
codename: Aurora
version: 1.0
name: The primary name of the platform.codename: The project's internal name (Aurora).
2. Platform Integrations
Machine Learning
ml:
training_platform: kaggle
dataset_platform: huggingface
training_platform: Defines where our ML kernels run. Currently, we use Kaggle for its GPU acceleration and free compute hours.dataset_platform: Where our long-term datasets are stored (HuggingFace).
3. HuggingFace Settings
huggingface:
dataset_repo_prefix: SpaceGen/solarhub-
model_repo_prefix: SpaceGen/solarhub-model-
token_secret: HF_TOKEN
dataset_repo_prefix: The common prefix for all dataset repositories.model_repo_prefix: The common prefix for all model weight repositories.token_secret: The name of the GitHub Actions secret that contains the HuggingFace write token.
4. Data Task Types
The data.task_types section defines which solar features we currently support.
data:
task_types:
- sunspot
- solar_flare
- magnetogram
- coronal_hole
- prominence
- active_region
- cme
These task types are used by:
pull_new_urls.py: To know which data to crawl.parse_issue_annotation.py: To validate user-submitted labels.merge_annotations_to_hf.py: To target the correct HuggingFace repositories.
5. Source APIs
Defines the official solar observatories we crawl for data.
data:
source_apis:
- name: NASA SDAC
- name: SOHO LASCO
- name: SDO HMI
...
6. Annotation Strategy
annotations:
merge_strategy: append
clear_after_merge: true
merge_strategy: Currently set toappend, meaning new user labels are added to our datasets without overwriting historical ones.clear_after_merge: (Boolean) Iftrue, the localannotations/directory is reset after a successful push to HuggingFace.