TestSbert
This module contains tests for the sbert.py script, which generates embeddings for anime and manga datasets using the Sentence-BERT model.
The tests verify
- Successful execution of the script with valid command-line arguments
- Creation and validation of embedding files for both anime and manga datasets
- Proper saving and structure of evaluation results
- Correct dimensionality of generated embeddings
- Consistency between model parameters and evaluation data
run_sbert_command_and_verify
¶
Run the SBERT command-line script and verify the generated embeddings and evaluation results.
This function
- Executes the SBERT script with specified parameters
- Verifies script execution success
- Checks for creation of expected embedding files
- Validates embedding dimensions
- Verifies evaluation results structure and content
PARAMETER | DESCRIPTION |
---|---|
model_name
|
The name of the model to be used (e.g., 'sentence-transformers/all-mpnet-base-v2')
TYPE:
|
dataset_type
|
The type of dataset ('anime' or 'manga')
TYPE:
|
expected_files
|
List of expected embedding file names to be generated
TYPE:
|
RAISES | DESCRIPTION |
---|---|
AssertionError
|
If any of the following conditions are not met: - Script execution fails - Expected embedding files are not created - Embeddings have invalid dimensions - Evaluation results are missing or malformed - Model parameters don't match input parameters |
Source code in tests/test_sbert.py
test_run_sbert_command_line
¶
Test the SBERT command line script by running it with different dataset types and verifying the outputs.
This test
- Tests both anime and manga datasets
- Verifies generation of dataset-specific embedding files
- Validates embedding file structure and content
- Checks evaluation results for each dataset type
PARAMETER | DESCRIPTION |
---|---|
model_name
|
The name of the model to be tested, provided by pytest fixture
TYPE:
|
dataset_type
|
The type of dataset being tested ('anime' or 'manga')
TYPE:
|
expected_files
|
List of expected embedding files for the dataset type
TYPE:
|
The test is parameterized to run separately for anime and manga datasets, with different expected output files for each type.