Patch extraction pool classes

dplabtools includes a set of Pool classes providing a parallel execution interface for classes described in Patch Extraction. Using pool classes for extracting patches allows for the processing of multiple WSIs at the same time and combining the results either in memory or by saving to disk. Patch extraction pool classes have been designed to accept Patch locations/sampling pool classes as their input, however, they could also extract patches calculated by other sources.

Warning

Currently MemPatchExtractorPool and MultiResMemPatchExtractorPool suffer from a performance degradation related to the default inter-process communication present in Python. While both classes deliver correct results, their use in large scale experiments is not recommended. This part of the package is currently pending a rewrite using a shared memory model.

In memory patch extraction pool

MemPatchExtractorPool is a class for parallel extraction of patches which will reside in memory.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will create a list of in-memory patches from multiple WSIs and will print them on the screen:

from dplabtools.slides.patches import MemPatchExtractorPool

extractor_pool = MemPatchExtractorPool(
    patches_pool=patches_pool,
    thread_num_workers=2,
    proc_num_workers=3,
)

for patch in extractor_pool.patch_list:
    print(patch)

Class details

class dplabtools.slides.patches.MemPatchExtractorPool(...)

Extractor pool implementation for MemPatchExtractor.

property patch_count

Return the number of extracted patches.

property patch_list

Return the extracted patches stored in memory.

property pids

Return the IDs of the executed processes.

In memory patch extraction pool (MRP)

MultiResMemPatchExtractorPool is a class for parallel extraction of multi resolution patches which will reside in memory.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will create a list of in-memory multi resolution patches from multiple WSIs and will print them on the screen:

from dplabtools.slides.patches import MultiResMemPatchExtractorPool

extractor_pool = MultiResMemPatchExtractorPool(
    patches_pool=patches_pool,
    levels_or_mpps=[2, 1, 0],
    thread_num_workers=2,
    proc_num_workers=3,
)

for multires_patch in extractor_pool.patch_list:
    for patch in multires_patch:
        print(patch)

Class details

class dplabtools.slides.patches.MultiResMemPatchExtractorPool(...)

Extractor pool implementation for MultiResMemPatchExtractor.

property patch_count

Return the number of extracted patches.

property patch_list

Return the extracted patches stored in memory.

property patchset_count

Return the number of patch sets created during the extraction.

property pids

Return the IDs of the executed processes.

Parameters specific to MultiResMemPatchExtractorPool:

class dplabtoolshiddenclass_978f8a2ebf6349679b9317a167468178
Parameters:

levels_or_mpps (list of level_or_mpp values) – Int or Float numbers representing WSI levels or MPP values for multi resolution patches.

To disk patch extraction pool

DiskPatchExtractorPool is a class for parallel extraction of patches which will be saved to disk.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will save extracted patches from multiple WSIs into /tmp directory:

from dplabtools.slides.patches import DiskPatchExtractorPool

extractor_pool = DiskPatchExtractorPool(
    patches_pool=patches_pool,
    output_dir="/tmp",
    image_type="png",
    thread_num_workers=2,
    proc_num_workers=3,
)

Class details

class dplabtools.slides.patches.DiskPatchExtractorPool(...)

Extractor pool implementation for DiskPatchExtractor.

property manifest_ids

Return the IDs of the created manifests.

property patch_count

Return the number of extracted patches.

property pids

Return the IDs of the executed processes.

To disk patch extraction pool (MRP)

MultiResDiskPatchExtractorPool is a class for parallel extraction of multi resolution patches which will be saved to disk.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will save sets of extracted patches from multiple WSIs into the /tmp directory:

from dplabtools.slides.patches import MultiResDiskPatchExtractorPool

extractor_pool = DiskPatchExtractorPool(
    patches_pool=patches_pool,
    levels_or_mpps=[0, 1],
    output_dir="/tmp",
    image_type="png",
    thread_num_workers=2,
    proc_num_workers=3,
)

Class details

class dplabtools.slides.patches.MultiResDiskPatchExtractorPool(...)

Extractor pool implementation for MultiResDiskPatchExtractor.

property manifest_ids

Return the IDs of the created manifests.

property patch_count

Return the number of extracted patches.

property patchset_count

Return the number of patch sets created during the extraction.

property pids

Return the IDs of the executed processes.

Parameters specific to MultiResMemPatchExtractorPool:

class dplabtoolshiddenclass_79f0bcd5f924471bbea79a3c2c888d64
Parameters:
  • levels_or_mpps (list of level_or_mpp values) – Int or Float numbers representing WSI levels or MPP values for multi resolution patches.

  • global_counter (int, default=1) – Initial counter value for enumerating patch set directories (set1, set2, set3, …) for an entire collection of WSIs. Setting this value to None will cause the patch set counter to be reset for each WSI (wsi1_set1, wsi1_set2, … wsi2_set1, wsi2_set2, …).

Parameters common to disk patch extraction pool classes

class dplabtoolshiddenclass_b0634b2a8d824928be1495afc4933eb0
Parameters:
  • output_dir (str) – Directory name or path for saving the extracted patches.

  • image_type (str) – Image type of the saved files (PNG, JPG, etc.).

  • filename_comment (str, optional) – Comment to be added to the saved file names.

  • filename_separator (str, default="_") – Separator used in the saved file names.

  • create_subdirs (bool, default=False) – Whether to create label specific subdirectories inside output_dir or not.

Parameters common to all patch extraction pool classes

class dplabtoolshiddenclass_31556b57ce6249fbbfe0c27d6792316e
Parameters:
  • patches_pool (object) – Object representing one of the patch location pool classes.

  • proc_num_workers (int) – Number of processes in the pool. This value corresponds directly to the number of WSIs to be processed simultaneously.

  • thread_num_workers (int) – Number of threads per one worker process. This value indicates how many threads will be used to extract patches from a single WSI.

  • proc_mp_chunksize (int, default=1) – Data chunk size used in process parallelization processing.

  • thread_mp_chunksize (int, default=1) – Data chunk size used in thread parallelization processing.

  • resampling_mode (str, optional) – One of two supported down/up-sampling methods: wsi or tile

  • included_labels (list of str, optional) – Polygon labels included in patch extraction, all other labels will be ignored.

  • excluded_labels (list of str, optional) – Polygon labels excluded from patch extraction.