HumanEval (2021)

The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz These researchers.

To get a quick overview of the problems with jq:

jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl

The first two problems are:

==== HumanEval/0 has_close_elements
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """

==== HumanEval/1 separate_paren_groups
from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """

so we understand that it takes as input an empty function with a docstring and you have to fill the function body.

The paper also shows that there can be other defined functions besides the one you have to implement.

HumanEval (2021)

 Ancestors (9)