TestMergeDatasets
This module contains unit tests for the functions in the src.merge_datasets module.
The tests cover
- Preprocessing of names to ensure correct formatting (test_preprocess_name)
- Cleaning of synopsis data by removing unwanted phrases (test_clean_synopsis)
- Consolidation of multiple title columns into a single title column (test_consolidate_titles)
- Removal of duplicate synopses or descriptions (test_remove_duplicate_infos)
- Addition of additional synopsis information to the merged DataFrame (test_add_additional_info)
- Handling of missing matches when adding additional info (test_add_additional_info_no_match)
- Processing of partial title information (test_add_additional_info_partial_titles)
- Handling of missing title data (test_add_additional_info_all_titles_na)
- Handling of whitespace and case variations (test_add_additional_info_whitespace_case)
test_add_additional_info
¶
Test the add_additional_info function for basic functionality.
Tests
- Adding additional synopsis information when titles match
- Creating new synopsis column in output DataFrame
- Correctly using mock find_additional_info function
- Handling English and Japanese titles
Source code in tests/test_merge_datasets.py
test_add_additional_info_all_titles_na
¶
Test the add_additional_info function with completely missing title information.
Tests
- Handling rows where all title columns are NA
- Skipping processing for rows with no valid titles
- Maintaining data integrity for rows with all NA titles
- Correctly processing mixed rows (some with all NA titles, some with valid titles)
Source code in tests/test_merge_datasets.py
test_add_additional_info_no_match
¶
Test the add_additional_info function when no matches are found.
Tests
- Handling cases where no matching additional info exists
- Proper handling of NA values for non-matches
- Processing multiple rows with varying match conditions
- Maintaining data integrity for non-matching rows
Source code in tests/test_merge_datasets.py
test_add_additional_info_partial_titles
¶
Test the add_additional_info function with partial title information.
Tests
- Processing rows with some NA title columns but at least one valid title
- Matching based on available title information
- Handling mixed NA and non-NA title columns
- Correct synopsis assignment when matching on partial information
Source code in tests/test_merge_datasets.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 |
|
test_add_additional_info_whitespace_case
¶
Test the add_additional_info function's handling of whitespace and case variations.
Tests
- Processing titles with leading/trailing whitespace
- Handling different case variations (uppercase, lowercase, mixed)
- Correct matching despite whitespace/case differences
- Maintaining original data while normalizing for comparison
Source code in tests/test_merge_datasets.py
test_clean_synopsis
¶
Test the clean_synopsis function to ensure it correctly cleans the synopsis column.
Tests
- Preserving valid synopses
- Removing specified unwanted phrases
- Replacing unwanted phrases with empty strings
- Handling multiple occurrences of unwanted phrases
Source code in tests/test_merge_datasets.py
test_consolidate_titles
¶
Test the consolidate_titles function to ensure it correctly consolidates multiple title columns.
Tests
- Prioritizing the main 'title' column values
- Filling missing values from alternate title columns in order
- Handling multiple NA values across columns
- Preserving existing valid titles
Source code in tests/test_merge_datasets.py
test_preprocess_name
¶
Test the preprocess_name function to ensure it correctly preprocesses names.
Tests
- Converting strings to lowercase
- Stripping leading/trailing whitespace
- Handling None values (returns empty string)
- Handling numeric values (converts to string)
Source code in tests/test_merge_datasets.py
test_remove_duplicate_infos
¶
Test the remove_duplicate_infos function to ensure it correctly handles duplicate synopses.
Tests
- Identifying and removing duplicate synopses across columns
- Preserving unique synopses
- Handling NA values
- Maintaining original data structure and column order