Unbabel/WMT24-QE-task2-baseline · Example usage not working

13 days ago

Is the example usage wrong? I am told this is not a valid model ID, and I also don't understand why it's even there since it differs from this model's ID: "Unbabel/wmt22-cometkiwi-da"

When I replace model_path = download_model("Unbabel/wmt22-cometkiwi-da") with model_path = download_model("Unbabel/WMT24-QE-task2-baseline"), I get the following error. Is the usage correct, or is it just copied and pasted from another COMET model's usage? How might I use this model in Python?

The error I encounter is KeyError: 'unite_metric_multi_task', which is thrown when I run the line model = load_from_checkpoint(model_path).

patrick-wilken

1 day ago

Same problem here. It seems the code implementing this model is just not published? Neither in latest unbabel-comet==2.2.4 nor anywhere on github.
(git grep unite_metric_multi_task $(git rev-list --all) returns nothing.)
Was anyone able to run this model? That is, getting word labels/confidences as output.

patrick-wilken

1 day ago

Ah sorry, after applying RTFM ("Using this model requires installing an older fork of unbabel-comet") I got it to run. So, as described use https://github.com/ricardorei/wmt22-comet-legacy. Had some trouble installing it, what worked in the end was downgrading to python 3.7 via pyenv.
model.predict() in comet/cli/score.py will actually return the word-level OK/BAD tags (output.tags), but they are not written to the output, had to write some lines of code myself to do that.