macro_eeg_model.data_prep.data_preparator#

Classes#

DataPreparator

A class to prepare and process data from directories containing CSV files with

Module Contents#

class macro_eeg_model.data_prep.data_preparator.DataPreparator[source]#

A class to prepare and process data from directories containing CSV files with connectivity data across subjects. The processed data is saved as a NumPy array after averaging across multiple subjects.

prep_and_save(directory_name, included_word, delimiter, name)[source]#

Handles the prerequisites for preparing and saving the data from a specified directory within the Julich data path (see src.utils.paths.Paths) and then does the actual data preparation and saving using _prep_and_save_data().

This method filters the files in the directory based on an included word in their filenames, processes them into NumPy arrays, calculates an average array, and saves it to a specified path.

Parameters:
  • directory_name (str) – The name of the directory containing the subject folders.

  • included_word (str) – The word that should be included in the CSV filenames to be processed.

  • delimiter (str) – The delimiter used in the CSV files.

  • name (str) – The name to use when saving the final averaged array.

_prep_and_save_data(directory, subjects, included_word, delimiter, name)[source]#

Extracts relevant CSV files based on the included word using _extract_csv_files() converts them to NumPy arrays using _get_arrays_from_files(), computes an average array using _calculate_avg_array(), and saves it as a .npy file.

Parameters:
  • directory (str or pathlib.Path) – The path to the directory containing the subject folders.

  • subjects (list) – The list of subject folder names.

  • included_word (str) – The word that should be included in the CSV filenames to be processed.

  • delimiter (str) – The delimiter used in the CSV files.

  • name (str) – The name to use when saving the final averaged array.

static _extract_csv_files(directory, subjects, included_word)[source]#

Extracts the names of CSV files that include a specific word in their filenames. Searches through the directory of each subject for CSV files that contain the specified word in their name.

Parameters:
  • directory (str or pathlib.Path) – The path to the directory containing the subject folders.

  • subjects (list) – The list of subject folder names.

  • included_word (str) – The word that must be included in the filenames.

Returns:

A list of filenames that match the criteria.

Return type:

list

_get_arrays_from_files(directory, subjects, files, delimiter=',')[source]#

Retrieves and converts the relevant CSV files into NumPy arrays using _convert_csv_file_to_numpy_array().

For each subject in the directory, this method identifies the files to be processed, converts them into NumPy arrays, and collects them for further processing.

Parameters:
  • directory (str or pathlib.Path) – The path to the directory containing the subject folders.

  • subjects (list) – The list of subject folder names.

  • files (list) – The list of filenames to be processed.

  • delimiter (str, optional) – The delimiter used in the CSV files (default is ‘,’).

Returns:

A list of NumPy arrays corresponding to the processed CSV files.

Return type:

list

static _convert_csv_file_to_numpy_array(file_path, delimiter)[source]#

Converts a CSV file into a NumPy array.

Parameters:
  • file_path (str or pathlib.Path) – The full path to the CSV file.

  • delimiter (str) – The delimiter used in the CSV file.

Returns:

A NumPy array representing the data from the CSV file.

Return type:

numpy.ndarray

static _calculate_avg_array(numpy_arrays)[source]#

Computes the average of each element across multiple NumPy arrays, excluding the highest and lowest 20% of values (to reduce the impact of outliers), and returns the resulting array.

Parameters:

numpy_arrays (list) – A list of NumPy arrays to average.

Returns:

A NumPy array containing the average values.

Return type:

numpy.ndarray