musif.extract.extract module

class musif.extract.extract.FeaturesExtractor(*args, **kwargs)[source]

Bases: object

Extract features for a score or a list of scores, according to the parameters established in the configurtaion files. It extracts musical features from .xml and .mscx files based on the configuration and stores them in a dictionary (score features) that at the end will be returned as a DataFrame. Features corresponds to modules placed in musif/features directory, and will be computed in order according to the configuration. Some features might depend on the previous ones, so order is important.

extract() pandas.DataFrame[source]

Extracts features given in the configuration data getting a file, directory or several file paths, returning a DataFrame containing musical features.

Return type

Score dataframe with the extracted features of given scores. For one score only, a DataFrem is returned with one row only.

Raises
  • ParseFileError – If the musicxml file can’t be parsed for any reason.

  • KeyError – If features aren’t loaded in corrected order or dependencies

extract_modules(packages: list, data: dict, parts_data: dict, basic: bool)[source]
musif.extract.extract.find_files(extension: str, obj: Union[str, List[Union[str, PurePath]]], limit_files: Optional[List[str]] = None, check_file: Optional[str] = None) List[PurePath][source]

Extracts the paths to files given an extension

Given a path, a directory path, returns a list of paths to musicxml files found, in alphabetic order. If given neither a string nor list of strings raise a TypeError and if the file doesn’t exists returns a ValueError

Parameters
  • extension (str) – A string representing the extension that will be looked for

  • obj (Union[str, Iterable[str]]) – A path or directory, or a list of paths or directories

Returns

resp – The list of musicxml files found in the provided arguments This list will be returned in alphabetical order

Return type

List[PurePath]

Raises
  • TypeError – If the type is not the expected (str or List[str]).

  • ValueError – If the provided string is neither a directory nor a file path

musif.extract.extract.parse_filename(file_path: str, split_keywords: List[str], expand_repeats: bool = False, export_dfs_to: Optional[Union[str, PurePath]] = None) music21.stream.Score[source]

This function parses a musicxml file and returns a music21 Score object. If the file has already been parsed, it will be loaded from cache instead of processing it again. Split a part in different parts if the instrument family is in keywords argument and expands repeats if indicated.

Parameters
  • file_path (str) –

  • path. (A path to a music xml) –

  • split_keywords (List[str]) – A lists of keywords based on music21 instrument sound names to split in different parts.

  • expand_repeats (bool) – Determines whether to expand or not the repetitions. Default value is False.

  • export_dfs_to (Union[str, PurePath]) – Path to a directory where dataframes containing the score data are exported. If None, no score is exported. Default value is None.

Returns

resp – The score saved in cache or the new score parsed with the necessary parts split.

Return type

Score

Raises

ParseFileError – If the xml file can’t be parsed for any reason.

musif.extract.extract.parse_musescore_file(file_path: str, expand_repeats: bool = False) pandas.DataFrame[source]

This function parses a musescore file and returns a pandas dataframe. If the file has already been parsed, it will be loaded from cache instead of processing it again.

Parameters
  • file_path (str) – A path to a music mscx path.

  • expand_repeats (bool) – Determines whether to expand or not the repetitions. Default value is False.

Returns

resp – The score saved in cache or the new score parsed in the form of a dataframe.

Return type

pd.DataFrame

Raises

ParseFileError – If the musescore file can’t be parsed for any reason.