Skip to main content

Running Tasks

LlamaFactory Adapter Plugin

Configuration flow

Before running a training task, configure these environment variables in bash:

Environment variableDescription
ECO_GRPC_ADDRServer address
ECO_CLIENT_IDUnique user identifier [ID]
ECO_API_KEYAPI authentication value [PASSWORD]
ECO_TLS_ROOT_CATLS root certificate path

Example:

export ECO_GRPC_ADDR="121.41.XXX.XX:80"
export ECO_CLIENT_ID="User_XX"
export ECO_API_KEY="XXXX"
export ECO_TLS_ROOT_CA="xx/rootCA.pem"

Usage example

Replace these values based on your environment:

  • Server address
  • Certificate path
  • Data path
  • Model path
  • Configuration file path

Example command:

export ECO_GRPC_ADDR="121.41.XXX.XX:80" && \
export ECO_CLIENT_ID="User_XX" && \
export ECO_API_KEY="XXXX" && \
export ECO_TLS_ROOT_CA="/root/rootCA.pem" && \
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --config_file fsdp_config.yaml \
--main_process_port 29501 src/train.py emotion_rec_sft_full_eco.yaml

Plugin log reference

(1) Plugin imported and initialized successfully

The log includes:

[EcoPhase] EcoMonitor initialized.

(2) Plugin enabled

The log includes:

[EcoPhase] API is enabled.

(3) Plugin inactive

The log includes:

[EcoPhase] API is disabled.

(4) Early stop triggered

The system automatically saves the model and prints a training summary, for example:

Task early stopped at step 200/2000. Reduction: 90.0%. Saved GPU-hours: 1.03.

This means:

FieldMeaning
200/2000The task stopped early at step 200 out of the planned 2000 steps
Reduction: 90.0%Training steps were reduced by about 90.0%
Saved GPU-hours: 1.03Estimated savings of 1.03 GPU-hours

Notes

  1. Environment variables must be configured correctly and must not contain extra spaces.
  2. The root certificate must exist and be valid.
  3. The plugin initialization code must be inserted at the correct position in trainer.py.
  4. Confirm this before running:
enabled=True
  1. Check these states first in the logs:
API is enabled
API is disabled
  1. The plugin currently supports data parallelism only. Other parallel modes are not supported yet.