musif.process.processor module

class musif.process.processor.DataProcessor(info: Union[str, pandas.DataFrame], *args, **kwargs)[source]

Bases: object

Processor class that treats columns and information of a DataFrame

This operator processes information from a DataFrame or a .csv file. It deletes unseful columns for analysis and saves important ones. Also saves data in several files in .csv format. The main method .process() returns a DataFrame and saves data. Requires to have a labels file in ./internal_data directory containing each label assigned to each score. …

data

DataFrame extracted with FeaturesExtractor containing all info.

Type

DataFrame

info

Path to .csv file or Dataframe containing the information from FeaturesExtractor

Type

str

process_info(info=info: Union[str, DataFrame])

Reads info and returns a DataFrame

process()[source]

Processes all the DataFrame information

group_columns()[source]

Groups thos columns related to Keys, Key_Modulatory and Degree for agregated analysis

unbundle_instrumentation()[source]

Separates ‘Instrumentation’ column into several Presence_ columns for every instrument present in Instrumentation.

delete_undesired_columns(**kwargs)[source]

Deletes all columns that are not needed according to config.yml file

save(dest_path: str)[source]

Saves final information to various csv files, splitting data, metadata and features

delete_files_without_harmony()[source]

Deletes files (actually rows in the DataFrame) that didn’t have a proper harmonic analysis and, there fore, got a value of 0 in ‘Harmony_Available’ column

delete_undesired_columns(**kwargs) None[source]

Deletes not necessary columns for statistical analysis.

If keyword arguments are passed in, they overwrite those found into configurationg file

Parameters

**kwargs (str, optional) – Any value from config.yml can be overwritten by passing arguments to the method

Raises

KeyError – If any of the columns required to delete is not found in the original DataFrame.

group_columns() None[source]

Groups Key_*_PercentageMeasures, Key_Modulatory and Degrees columns. Into bigger groups for agregated analysis, keeping the previous ones. Also deletes unnecesary columns for analysis.

process() pandas.DataFrame[source]

Main method of the class. Removes NaN values, deletes unuseful columns and merges those that are needed according to config.yml file.

Return type

Dataframe object

replace_nans() None[source]
save(dest_path: Union[str, PurePath], ft='csv') None[source]

Saves current information into a file given the name of dest_path

To load one of those file, remember to set the index to musif.extract.constant.ID, and, if windows are used, to musif.extract.constant.WINDOW_ID:

`python df = pd.read_csv('window_alldata.csv').set_index(['Id', 'WindowId']) `

Parameters
  • dest_path (str or Path) – Path to directory where the file will be stored; a suffix like _metadata.csv will be added.

  • ft (str) – Type of file for saving. The filetype must be supported by pandas, e.g. to_csv, to_feather, to_parquet, etc.

unbundle_instrumentation() None[source]

Separates Instrumentation column into as many columns as instruments present in Instrumentation, assigning a value of 1 for every instrument that is present and 0 if it is not for every row (aria).