hezar.models.image2text.vit_gpt2.vit_gpt2_image2text module

class hezar.models.image2text.vit_gpt2.vit_gpt2_image2text.ViTGPT2Image2Text(config: ViTGPT2Image2TextConfig, **kwargs)[source]

Bases: Model

ViT + GPT2 for image to text generation (image captioning)

compute_loss(logits: Tensor, labels: Tensor) Tensor[source]

Compute loss on the model outputs against the given labels

Parameters:
  • inputs – Input tensor to compute loss on

  • targets – Target tensor

Returns:

Loss tensor

forward(pixel_values, decoder_input_ids=None, decoder_attention_mask=None, encoder_outputs=None, past_key_values=None, decoder_inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, **kwargs)[source]

Forward inputs through the model and return logits, etc.

Parameters:

model_inputs – The required inputs for the model forward

Returns:

A dict of outputs like logits, loss, etc.

generate(pixel_values, generation_config=None, **kwargs)[source]

Generation method for all generative models. Generative models have the is_generative attribute set to True. The behavior of this method is usually controlled by generation part of the model’s config.

Parameters:
  • model_inputs – Model inputs for generation, usually the same as forward’s model_inputs

  • **kwargs – Generation kwargs

Returns:

Generated output tensor

image_processor = 'image_processor'
is_generative: bool = True
loss_func_name: str | LossType = 'cross_entropy'
post_process(model_outputs, **kwargs)[source]

Process model outputs and return human-readable results. Called in self.predict()

Parameters:
  • model_outputs – model outputs to process

  • **kwargs – extra arguments specific to the derived class

Returns:

Processed model output values and converted to human-readable results

preprocess(inputs: List[str] | List[np.ndarray] | List['Image'] | List[torch.Tensor], **kwargs)[source]

Given raw inputs, preprocess the inputs and prepare them for model’s forward().

Parameters:
  • raw_inputs – Raw model inputs

  • **kwargs – Extra kwargs specific to the model. See the model’s specific class for more info

Returns:

A dict of inputs for model forward

required_backends: List[Backends | str] = [Backends.TRANSFORMERS, Backends.TOKENIZERS, Backends.PILLOW]
tokenizer_name = 'bpe_tokenizer'