Skip to content

Models API

This page documents the search model components of AniSearch Model.

Overview

The models package contains the classes responsible for loading datasets, initializing cross-encoder models, and performing semantic search operations.

The package is structured as follows:

  • BaseSearchModel: Abstract base class providing common functionality
  • AnimeSearchModel: Implementation for anime search
  • MangaSearchModel: Implementation for manga search (with optional light novel support)

Model Search Workflow

The search process works by comparing a user query against all entries in the dataset:

flowchart TD
    A[User Query] --> B[Search Model]
    B --> C[Generate Query Variations]
    C --> D[Batch Process Queries]
    E[(Dataset)] --> D
    D --> F[Calculate Relevance Scores]
    F --> G[Sort Results]
    G --> H[Return Top-K Results]

    style A fill:#e1f5fe,stroke:#0288d1
    style E fill:#fff3e0,stroke:#ff9800
    style H fill:#e8f5e9,stroke:#4caf50
Press "Alt" / "Option" to enable Pan & Zoom

This process ensures efficient and accurate retrieval of relevant content based on semantic understanding rather than simple keyword matching.

BaseSearchModel

The foundation class with core functionality:

src.models.base_search_model.BaseSearchModel

BaseSearchModel(dataset_path: str, id_column: str, model_name: str = MODEL_NAME, device: Optional[str] = None, dataset_type: str = 'base')

Base class for cross-encoder powered semantic search models.

This class provides the foundation for building specialized search models that use cross-encoder architectures to compute semantic similarity between user queries and content descriptions (synopses). It handles the common functionality such as dataset loading, model initialization, and search computation.

The class is designed to be extended by specialized search models for different content types (e.g., anime, manga) that can implement additional domain-specific functionality while reusing the core search capabilities.

ATTRIBUTE DESCRIPTION
model_name

Name or path of the cross-encoder model being used

TYPE: str

dataset_path

Path to the dataset file

TYPE: str

id_col

Name of the ID column in the dataset

TYPE: str

dataset_type

Type of dataset ("anime" or "manga")

TYPE: str

device

Device being used for computation ('cpu', 'cuda', etc.)

TYPE: str

model

The loaded cross-encoder model

TYPE: CrossEncoder

df

The loaded and preprocessed dataset

TYPE: DataFrame

synopsis_cols

List of column names containing synopsis text

TYPE: List[str]

normalize_scores

Whether model scores need normalization

TYPE: bool

This constructor sets up the search model by:

  1. Initializing configuration parameters
  2. Detecting or setting the compute device (CPU/CUDA)
  3. Loading the cross-encoder model
  4. Loading and preprocessing the dataset
PARAMETER DESCRIPTION
dataset_path

Path to the dataset CSV file containing entries to search. The file should contain at minimum ID, title, and synopsis columns.

TYPE: str

id_column

Name of the column containing unique identifiers in the dataset. This will be used to reference specific entries in search results.

TYPE: str

model_name

Name or path of the cross-encoder model to use. Can be a Hugging Face model name or local path to a fine-tuned model. Defaults to the value specified in constants.MODEL_NAME.

TYPE: str DEFAULT: MODEL_NAME

device

Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If None, automatically selects the best available device.

TYPE: Optional[str] DEFAULT: None

dataset_type

Type of dataset being loaded, used for logging. Common values are "anime" or "manga".

TYPE: str DEFAULT: 'base'

RAISES DESCRIPTION
FileNotFoundError

If the dataset file cannot be found

ValueError

If the model_name is invalid or the model cannot be loaded

Example
# Create a basic search model with default settings
basic_search = BaseSearchModel(
    dataset_path="data/merged_anime_dataset.csv",
    id_column="anime_id",
    dataset_type="anime"
)

# Create a search model with custom model and device
custom_search = BaseSearchModel(
    dataset_path="data/merged_manga_dataset.csv",
    id_column="manga_id",
    model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
    device="cuda:0",
    dataset_type="manga"
)
Source code in src/models/base_search_model.py
def __init__(  # pylint: disable=too-many-arguments, too-many-positional-arguments
    self,
    dataset_path: str,
    id_column: str,
    model_name: str = MODEL_NAME,
    device: Optional[str] = None,
    dataset_type: str = "base",
):
    """
    Initialize the base search model with dataset and model configuration.

    This constructor sets up the search model by:

    1. Initializing configuration parameters
    2. Detecting or setting the compute device (CPU/CUDA)
    3. Loading the cross-encoder model
    4. Loading and preprocessing the dataset

    Args:
        dataset_path: Path to the dataset CSV file containing entries to search.
            The file should contain at minimum ID, title, and synopsis columns.
        id_column: Name of the column containing unique identifiers in the dataset.
            This will be used to reference specific entries in search results.
        model_name: Name or path of the cross-encoder model to use.
            Can be a Hugging Face model name or local path to a fine-tuned model.
            Defaults to the value specified in constants.MODEL_NAME.
        device: Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.).
            If None, automatically selects the best available device.
        dataset_type: Type of dataset being loaded, used for logging.
            Common values are "anime" or "manga".

    Raises:
        FileNotFoundError: If the dataset file cannot be found
        ValueError: If the model_name is invalid or the model cannot be loaded

    Example:
        ```python
        # Create a basic search model with default settings
        basic_search = BaseSearchModel(
            dataset_path="data/merged_anime_dataset.csv",
            id_column="anime_id",
            dataset_type="anime"
        )

        # Create a search model with custom model and device
        custom_search = BaseSearchModel(
            dataset_path="data/merged_manga_dataset.csv",
            id_column="manga_id",
            model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
            device="cuda:0",
            dataset_type="manga"
        )
        ```
    """
    # Store the model name and dataset info for later use
    self.model_name = model_name
    self.dataset_path = dataset_path
    self.id_col = id_column
    self.dataset_type = dataset_type

    # Auto-detect device if not specified
    if device is None:
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
    else:
        self.device = device

    # Load the cross-encoder model
    logger.info("Loading cross-encoder model: %s", model_name)
    self._load_model()

    # Load the dataset
    self._load_dataset()

dataset_path instance-attribute

dataset_path = dataset_path

dataset_type instance-attribute

dataset_type = dataset_type

device instance-attribute

device = 'cuda' if is_available() else 'cpu'

id_col instance-attribute

id_col = id_column

model_name instance-attribute

model_name = model_name

list_available_models staticmethod

list_available_models() -> Mapping[str, Dict[str, str]]
    List available pre-trained cross-encoder models categorized by type.

    This static method returns a dictionary of model categories and their
    corresponding model recommendations that can be used with the search system.
    These models are defined in the ALTERNATIVE_MODELS constant.

    Returns:
        Mapping[str, Dict[str, str]]: A dictionary where:
            - Keys are model categories (e.g., "Semantic Search", "Question Answering")
            - Values are dictionaries mapping model names to descriptions

    Example:
        ```python
        # Get a dictionary of available models by category
        available_models = BaseSearchModel.list_available_models()

        # Print model categories and models
        for category, models in available_models.items():
            print(f"

{category}:") for model_name, description in models.items(): print(f" - {model_name}: {description}") ```

Source code in src/models/base_search_model.py
@staticmethod
@handle_exceptions(log_exceptions=True, include_exc_info=True)
def list_available_models() -> Mapping[str, Dict[str, str]]:
    """
    List available pre-trained cross-encoder models categorized by type.

    This static method returns a dictionary of model categories and their
    corresponding model recommendations that can be used with the search system.
    These models are defined in the ALTERNATIVE_MODELS constant.

    Returns:
        Mapping[str, Dict[str, str]]: A dictionary where:
            - Keys are model categories (e.g., "Semantic Search", "Question Answering")
            - Values are dictionaries mapping model names to descriptions

    Example:
        ```python
        # Get a dictionary of available models by category
        available_models = BaseSearchModel.list_available_models()

        # Print model categories and models
        for category, models in available_models.items():
            print(f"\n{category}:")
            for model_name, description in models.items():
                print(f"  - {model_name}: {description}")
        ```
    """
    return ALTERNATIVE_MODELS

list_fine_tuned_models staticmethod

list_fine_tuned_models() -> Dict[str, str]

List locally available fine-tuned models that can be used for search.

This static method scans the fine-tuned model directory to find models that have been fine-tuned specifically for anime/manga search. It identifies valid models by checking for the presence of a config.json file.

RETURNS DESCRIPTION
Dict[str, str]

Dict[str, str]: A dictionary mapping: - Keys: Model directory names - Values: Full paths to the model directories

Notes
  • Searches in the "model/fine-tuned" directory by default
  • Only directories containing a config.json file are included
  • Returns an empty dictionary if no fine-tuned models are found
Example
# Get a dictionary of available fine-tuned models
fine_tuned_models = BaseSearchModel.list_fine_tuned_models()

if fine_tuned_models:
    print("Available fine-tuned models:")
    for name, path in fine_tuned_models.items():
        print(f"- {name}: {path}")
else:
    print("No fine-tuned models found.")
Source code in src/models/base_search_model.py
@staticmethod
@handle_exceptions(log_exceptions=True, include_exc_info=True)
def list_fine_tuned_models() -> Dict[str, str]:
    """
    List locally available fine-tuned models that can be used for search.

    This static method scans the fine-tuned model directory to find models
    that have been fine-tuned specifically for anime/manga search. It identifies
    valid models by checking for the presence of a config.json file.

    Returns:
        Dict[str, str]: A dictionary mapping:
            - Keys: Model directory names
            - Values: Full paths to the model directories

    Notes:
        - Searches in the "model/fine-tuned" directory by default
        - Only directories containing a config.json file are included
        - Returns an empty dictionary if no fine-tuned models are found

    Example:
        ```python
        # Get a dictionary of available fine-tuned models
        fine_tuned_models = BaseSearchModel.list_fine_tuned_models()

        if fine_tuned_models:
            print("Available fine-tuned models:")
            for name, path in fine_tuned_models.items():
                print(f"- {name}: {path}")
        else:
            print("No fine-tuned models found.")
        ```
    """
    fine_tuned_models: Dict[str, str] = {}
    model_dir = "model/fine-tuned"

    if not os.path.exists(model_dir):
        logger.warning("Fine-tuned model directory not found: %s", model_dir)
        return fine_tuned_models

    for model_name in os.listdir(model_dir):
        model_path = os.path.join(model_dir, model_name)
        if os.path.isdir(model_path) and os.path.exists(
            os.path.join(model_path, "config.json")
        ):
            fine_tuned_models[model_name] = model_path

    return fine_tuned_models

search

search(query: str, num_results: int = NUM_RESULTS, batch_size: int = DEFAULT_BATCH_SIZE) -> List[Dict[str, Any]]

Search for entries matching the provided description or query.

This method performs semantic search across the dataset by computing similarity scores between the user query and all synopses in the dataset. It returns the top matches sorted by relevance score.

The search process includes:

  1. Creating sentence pairs between the query and all synopses
  2. Computing relevance scores using the cross-encoder model in batches
  3. Sorting results by score and returning the top matches
PARAMETER DESCRIPTION
query

The search query or description to match against synopses. This should be a descriptive text that captures the content the user is looking for.

TYPE: str

num_results

Number of top matches to return, sorted by relevance score. Defaults to the value specified in constants.NUM_RESULTS.

TYPE: int DEFAULT: NUM_RESULTS

batch_size

Number of sentence pairs to process at once with the model. Using batches helps manage memory usage with large datasets. Defaults to the value specified in constants.DEFAULT_BATCH_SIZE.

TYPE: int DEFAULT: DEFAULT_BATCH_SIZE

RETURNS DESCRIPTION
List[Dict[str, Any]]

List[Dict[str, Any]]: A list of dictionaries, each containing: - id: The entry ID from the id_column specified during initialization - title: The entry title - score: The relevance score (higher is better) - synopsis: A preview of the entry synopsis (truncated to 500 chars)

The list is sorted by score in descending order.

RAISES DESCRIPTION
ValueError

If the query is empty or consists only of whitespace

Example
# Initialize a search model
search_model = BaseSearchModel(
    dataset_path="data/merged_anime_dataset.csv",
    id_column="anime_id"
)

# Search for content about time travel
results = search_model.search(
    query="A story where characters travel through time and change history",
    num_results=5,
    batch_size=64
)

# Process the top results
for result in results:
    print(f"{result['title']} (Score: {result['score']:.2f})")
    print(f"Synopsis: {result['synopsis'][:100]}...")
Source code in src/models/base_search_model.py
@handle_exceptions(log_exceptions=True, include_exc_info=True)
def search(
    self,
    query: str,
    num_results: int = NUM_RESULTS,
    batch_size: int = DEFAULT_BATCH_SIZE,
) -> List[Dict[str, Any]]:
    """
    Search for entries matching the provided description or query.

    This method performs semantic search across the dataset by computing similarity
    scores between the user query and all synopses in the dataset. It returns the
    top matches sorted by relevance score.

    The search process includes:

    1. Creating sentence pairs between the query and all synopses
    2. Computing relevance scores using the cross-encoder model in batches
    3. Sorting results by score and returning the top matches

    Args:
        query: The search query or description to match against synopses.
            This should be a descriptive text that captures the content
            the user is looking for.
        num_results: Number of top matches to return, sorted by relevance score.
            Defaults to the value specified in constants.NUM_RESULTS.
        batch_size: Number of sentence pairs to process at once with the model.
            Using batches helps manage memory usage with large datasets.
            Defaults to the value specified in constants.DEFAULT_BATCH_SIZE.

    Returns:
        List[Dict[str, Any]]: A list of dictionaries, each containing:
            - id: The entry ID from the id_column specified during initialization
            - title: The entry title
            - score: The relevance score (higher is better)
            - synopsis: A preview of the entry synopsis (truncated to 500 chars)

            The list is sorted by score in descending order.

    Raises:
        ValueError: If the query is empty or consists only of whitespace

    Example:
        ```python
        # Initialize a search model
        search_model = BaseSearchModel(
            dataset_path="data/merged_anime_dataset.csv",
            id_column="anime_id"
        )

        # Search for content about time travel
        results = search_model.search(
            query="A story where characters travel through time and change history",
            num_results=5,
            batch_size=64
        )

        # Process the top results
        for result in results:
            print(f"{result['title']} (Score: {result['score']:.2f})")
            print(f"Synopsis: {result['synopsis'][:100]}...")
        ```
    """
    if not query.strip():
        raise ValueError("Search query cannot be empty")

    logger.info("Searching for: %s", query)

    # Prepare pairs for cross-encoder scoring
    all_synopses = self.df["combined_synopsis"].tolist()
    sentence_pairs = [(query, text) for text in all_synopses]

    # Calculate total number of pairs and batches
    total_pairs = len(sentence_pairs)
    if batch_size <= 0:
        batch_size = DEFAULT_BATCH_SIZE
        logger.warning(
            "Invalid batch size provided, using default: %d", DEFAULT_BATCH_SIZE
        )

    # Compute relevance scores in batches
    logger.info(
        "Computing relevance scores with cross-encoder (batch size: %d)", batch_size
    )
    scores: List[Any] = []

    # Use tqdm to display progress for all datasets
    with tqdm(
        total=total_pairs,
        desc="Scoring",
        disable=False,
    ) as pbar:
        for i in range(0, total_pairs, batch_size):
            batch = sentence_pairs[i : i + batch_size]
            # Disable progress_bar in the model's predict method to avoid multiple progress bars
            batch_scores = self.model.predict(batch, show_progress_bar=False)

            if isinstance(batch_scores, np.ndarray):
                scores.extend(batch_scores.tolist())
            else:
                scores.extend(batch_scores)

            pbar.update(len(batch))

    # Convert scores to numpy array
    scores_array = np.array(scores)

    # Get indices of top scores
    top_indices = scores_array.argsort()[-num_results:][::-1]

    # Prepare results
    results = []
    for idx in top_indices:
        entry = self.df.iloc[idx]
        synopsis = entry["combined_synopsis"]
        results.append(
            {
                "id": entry[self.id_col],
                "title": entry["title"],
                "score": float(scores_array[idx]),
                "synopsis": (
                    synopsis[:500] + "..." if len(synopsis) > 500 else synopsis
                ),
            }
        )

    logger.info("Found %d matches", len(results))
    return results

AnimeSearchModel

Specialized model for anime search:

src.models.anime_search_model.AnimeSearchModel

AnimeSearchModel(model_name: str = MODEL_NAME, device: Optional[str] = None)

Bases: BaseSearchModel

A specialized search model for finding anime based on textual descriptions.

This class extends BaseSearchModel to provide anime-specific search functionality. It loads a comprehensive dataset of anime entries and uses a cross-encoder model to compute semantic similarity between user queries and anime synopses, returning the most relevant matches.

The model uses the merged anime dataset to provide search capabilities across a wide range of anime titles with rich metadata and synopses information.

ATTRIBUTE DESCRIPTION
df

The loaded anime dataset

TYPE: DataFrame

id_col

Column name for the anime ID in the dataset

TYPE: str

model

The cross-encoder model used for scoring

TYPE: CrossEncoder

device

The device being used ('cpu', 'cuda', etc.)

TYPE: str

This constructor sets up the anime search model by loading the anime dataset and initializing the cross-encoder model.

PARAMETER DESCRIPTION
model_name

Name or path of the cross-encoder model to use. Defaults to the value specified in constants.MODEL_NAME.

TYPE: str DEFAULT: MODEL_NAME

device

Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If None, automatically selects the best available device.

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
FileNotFoundError

If the anime dataset cannot be found

ValueError

If the model_name is invalid or the model cannot be loaded

Example
# Basic initialization with default settings
anime_model = AnimeSearchModel()

# Initialize with custom model and specific device
custom_model = AnimeSearchModel(
    model_name="cross-encoder/ms-marco-MiniLM-L-12-v2",
    device="cuda"
)
Source code in src/models/anime_search_model.py
def __init__(
    self,
    model_name: str = MODEL_NAME,
    device: Optional[str] = None,
):
    """
    Initialize the anime search model with the specified parameters.

    This constructor sets up the anime search model by loading the anime dataset
    and initializing the cross-encoder model.

    Args:
        model_name: Name or path of the cross-encoder model to use.
            Defaults to the value specified in constants.MODEL_NAME.
        device: Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.).
            If None, automatically selects the best available device.

    Raises:
        FileNotFoundError: If the anime dataset cannot be found
        ValueError: If the model_name is invalid or the model cannot be loaded

    Example:
        ```python
        # Basic initialization with default settings
        anime_model = AnimeSearchModel()

        # Initialize with custom model and specific device
        custom_model = AnimeSearchModel(
            model_name="cross-encoder/ms-marco-MiniLM-L-12-v2",
            device="cuda"
        )
        ```
    """
    logger.info("Initializing AnimeSearchModel")
    super().__init__(
        dataset_path=ANIME_DATASET_PATH,
        id_column="anime_id",
        model_name=model_name,
        device=device,
        dataset_type="anime",
    )

MangaSearchModel

Specialized model for manga search:

src.models.manga_search_model.MangaSearchModel

MangaSearchModel(model_name: str = MODEL_NAME, device: Optional[str] = None, include_light_novels: bool = False)

Bases: BaseSearchModel

A specialized search model for finding manga based on textual descriptions.

This class extends BaseSearchModel to provide manga-specific search functionality. It loads a comprehensive dataset of manga entries and uses a cross-encoder model to compute semantic similarity between user queries and manga synopses, returning the most relevant matches.

The model provides additional functionality over the base class:

  • Optional filtering of light novels
  • Customized search parameters for manga content
  • Batch processing for efficient memory usage
  • Progress tracking during search operations
ATTRIBUTE DESCRIPTION
include_light_novels

Whether to include light novels in search results

TYPE: bool

df

The loaded manga dataset

TYPE: DataFrame

id_col

Column name for the manga ID in the dataset

TYPE: str

model

The cross-encoder model used for scoring

TYPE: CrossEncoder

device

The device being used ('cpu', 'cuda', etc.)

TYPE: str

This constructor sets up the manga search model by loading the manga dataset and initializing the cross-encoder model. It also configures whether light novels should be included in search results.

PARAMETER DESCRIPTION
model_name

Name or path of the cross-encoder model to use. Defaults to the value specified in constants.MODEL_NAME.

TYPE: str DEFAULT: MODEL_NAME

device

Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.). If None, automatically selects the best available device.

TYPE: Optional[str] DEFAULT: None

include_light_novels

Whether to include light novels in search results. When False, entries with type 'light_novel' will be filtered out. Defaults to False.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
FileNotFoundError

If the manga dataset cannot be found

ValueError

If the model_name is invalid or the model cannot be loaded

Example
# Basic initialization with default settings
manga_model = MangaSearchModel()

# Initialize with custom model and including light novels
custom_model = MangaSearchModel(
    model_name="cross-encoder/ms-marco-MiniLM-L-12-v2",
    device="cuda",
    include_light_novels=True
)
Source code in src/models/manga_search_model.py
def __init__(
    self,
    model_name: str = MODEL_NAME,
    device: Optional[str] = None,
    include_light_novels: bool = False,
):
    """
    Initialize the manga search model with the specified parameters.

    This constructor sets up the manga search model by loading the manga dataset
    and initializing the cross-encoder model. It also configures whether light
    novels should be included in search results.

    Args:
        model_name: Name or path of the cross-encoder model to use.
            Defaults to the value specified in constants.MODEL_NAME.
        device: Device to run the model on ('cpu', 'cuda', 'cuda:0', etc.).
            If None, automatically selects the best available device.
        include_light_novels: Whether to include light novels in search results.
            When False, entries with type 'light_novel' will be filtered out.
            Defaults to False.

    Raises:
        FileNotFoundError: If the manga dataset cannot be found
        ValueError: If the model_name is invalid or the model cannot be loaded

    Example:
        ```python
        # Basic initialization with default settings
        manga_model = MangaSearchModel()

        # Initialize with custom model and including light novels
        custom_model = MangaSearchModel(
            model_name="cross-encoder/ms-marco-MiniLM-L-12-v2",
            device="cuda",
            include_light_novels=True
        )
        ```
    """
    logger.info("Initializing MangaSearchModel")
    super().__init__(
        dataset_path=MANGA_DATASET_PATH,
        id_column="manga_id",
        model_name=model_name,
        device=device,
        dataset_type="manga",
    )
    self.include_light_novels = include_light_novels
    logger.info(
        "Light novels will %sbe included in search results",
        "" if include_light_novels else "not ",
    )

include_light_novels instance-attribute

include_light_novels = include_light_novels

search

search(query: str, num_results: int = NUM_RESULTS, batch_size: int = DEFAULT_BATCH_SIZE) -> List[Dict[str, Any]]

Search for manga entries matching the provided description or query.

This method computes semantic similarity scores between the provided query and all manga synopses in the dataset (after optional filtering), returning the top matches sorted by relevance.

The search process includes:

  1. Optional filtering of the dataset (e.g., removing light novels)
  2. Creating sentence pairs between the query and all manga synopses
  3. Computing relevance scores using the cross-encoder model in batches
  4. Sorting results by score and returning the top matches
PARAMETER DESCRIPTION
query

The search query or description to match against manga synopses. This should be a descriptive text that captures the manga content the user is looking for.

TYPE: str

num_results

Number of top matches to return, sorted by relevance score. Defaults to the value specified in constants.NUM_RESULTS.

TYPE: int DEFAULT: NUM_RESULTS

batch_size

Number of sentence pairs to process at once with the model. Using batches helps manage memory usage with large datasets. Defaults to the value specified in constants.DEFAULT_BATCH_SIZE.

TYPE: int DEFAULT: DEFAULT_BATCH_SIZE

RETURNS DESCRIPTION
List[Dict[str, Any]]

List[Dict[str, Any]]: A list of dictionaries, each containing: - id (int): The manga ID - title (str): The manga title - score (float): The relevance score (higher is better) - synopsis (str): A preview of the manga synopsis (truncated to 500 chars)

The list is sorted by score in descending order.

RAISES DESCRIPTION
ValueError

If the query is empty or consists only of whitespace

Example
# Search for manga about time travel
results = manga_model.search(
    query="A story about characters who can travel through time",
    num_results=3,
    batch_size=64
)

# Process the top results
for result in results:
    print(f"{result['title']} (Score: {result['score']:.2f})")
Source code in src/models/manga_search_model.py
def search(
    self,
    query: str,
    num_results: int = NUM_RESULTS,
    batch_size: int = DEFAULT_BATCH_SIZE,
) -> List[Dict[str, Any]]:
    """
    Search for manga entries matching the provided description or query.

    This method computes semantic similarity scores between the provided query
    and all manga synopses in the dataset (after optional filtering), returning
    the top matches sorted by relevance.

    The search process includes:

    1. Optional filtering of the dataset (e.g., removing light novels)
    2. Creating sentence pairs between the query and all manga synopses
    3. Computing relevance scores using the cross-encoder model in batches
    4. Sorting results by score and returning the top matches

    Args:
        query: The search query or description to match against manga synopses.
            This should be a descriptive text that captures the manga content
            the user is looking for.
        num_results: Number of top matches to return, sorted by relevance score.
            Defaults to the value specified in constants.NUM_RESULTS.
        batch_size: Number of sentence pairs to process at once with the model.
            Using batches helps manage memory usage with large datasets.
            Defaults to the value specified in constants.DEFAULT_BATCH_SIZE.

    Returns:
        List[Dict[str, Any]]: A list of dictionaries, each containing:
            - id (int): The manga ID
            - title (str): The manga title
            - score (float): The relevance score (higher is better)
            - synopsis (str): A preview of the manga synopsis (truncated to 500 chars)

            The list is sorted by score in descending order.

    Raises:
        ValueError: If the query is empty or consists only of whitespace

    Example:
        ```python
        # Search for manga about time travel
        results = manga_model.search(
            query="A story about characters who can travel through time",
            num_results=3,
            batch_size=64
        )

        # Process the top results
        for result in results:
            print(f"{result['title']} (Score: {result['score']:.2f})")
        ```
    """
    if not query.strip():
        raise ValueError("Search query cannot be empty")

    logger.info("Searching for: %s", query)

    # Get the appropriate dataframe for search
    search_df = self._get_search_dataframe()

    # Prepare pairs for cross-encoder scoring
    all_synopses = search_df["combined_synopsis"].tolist()
    sentence_pairs = [(query, text) for text in all_synopses]

    # Calculate total number of pairs and batches
    total_pairs = len(sentence_pairs)
    if batch_size <= 0:
        batch_size = DEFAULT_BATCH_SIZE
        logger.warning(
            "Invalid batch size provided, using default: %d", DEFAULT_BATCH_SIZE
        )

    # Compute relevance scores in batches
    logger.info(
        "Computing relevance scores with cross-encoder (batch size: %d)", batch_size
    )
    scores: List[Any] = []

    with tqdm(
        total=total_pairs,
        desc="Scoring",
        disable=False,
    ) as pbar:
        for i in range(0, total_pairs, batch_size):
            batch = sentence_pairs[i : i + batch_size]
            # Disable progress_bar in the model's predict method to avoid multiple progress bars
            batch_scores = self.model.predict(batch, show_progress_bar=False)

            if isinstance(batch_scores, np.ndarray):
                scores.extend(batch_scores.tolist())
            else:
                scores.extend(batch_scores)

            pbar.update(len(batch))

    scores_array = np.array(scores)

    # Get indices of top scores
    top_indices = scores_array.argsort()[-num_results:][::-1]

    # Prepare results
    results = []
    for idx in top_indices:
        entry = search_df.iloc[idx]
        synopsis = entry["combined_synopsis"]
        results.append(
            {
                "id": entry[self.id_col],
                "title": entry["title"],
                "score": float(scores_array[idx]),
                "synopsis": (
                    synopsis[:500] + "..." if len(synopsis) > 500 else synopsis
                ),
            }
        )

    logger.info("Found %d matches", len(results))
    return results