Patch extraction pool classes

dplabtools includes a set of Pool classes providing a parallel execution interface for classes described in Patch Extraction. Using pool classes for extracting patches allows for the processing of multiple WSIs at the same time and combining the results either in memory or by saving to disk. Patch extraction pool classes have been designed to accept Patch locations/sampling pool classes as their input, however, they could also extract patches calculated by other sources.

Warning

Currently MemPatchExtractorPool and MultiResMemPatchExtractorPool suffer from a performance degradation related to the default inter-process communication present in Python. While both classes deliver correct results, their use in large scale experiments is not recommended. This part of the package is currently pending a rewrite using a shared memory model.

In memory patch extraction pool

MemPatchExtractorPool is a class for parallel extraction of patches which will reside in memory.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will create a list of in-memory patches from multiple WSIs and will print them on the screen:

from dplabtools.slides.patches import MemPatchExtractorPool

extractor_pool = MemPatchExtractorPool(
    patches_pool=patches_pool,
    thread_num_workers=2,
    proc_num_workers=3,
)

for patch in extractor_pool.patch_list:
    print(patch)

Class details

class dplabtools.slides.patches.MemPatchExtractorPool(...)

Extractor pool implementation for MemPatchExtractor.

property patch_count

Return the number of extracted patches.

property patch_list

Return the extracted patches stored in memory.

property pids

Return the IDs of the executed processes.

In memory patch extraction pool (MRP)

MultiResMemPatchExtractorPool is a class for parallel extraction of multi resolution patches which will reside in memory.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will create a list of in-memory multi resolution patches from multiple WSIs and will print them on the screen:

from dplabtools.slides.patches import MultiResMemPatchExtractorPool

extractor_pool = MultiResMemPatchExtractorPool(
    patches_pool=patches_pool,
    levels_or_mpps=[2, 1, 0],
    thread_num_workers=2,
    proc_num_workers=3,
)

for multires_patch in extractor_pool.patch_list:
    for patch in multires_patch:
        print(patch)

Class details

class dplabtools.slides.patches.MultiResMemPatchExtractorPool(...)

Extractor pool implementation for MultiResMemPatchExtractor.

property patch_count

Return the number of extracted patches.

property patch_list

Return the extracted patches stored in memory.

property patchset_count

Return the number of patch sets created during the extraction.

property pids

Return the IDs of the executed processes.

Parameters specific to MultiResMemPatchExtractorPool:

class dplabtoolshiddenclass_3d58868fc75b4857969751593cf691e8
Parameters:

levels_or_mpps (list of level_or_mpp values) – Int or Float numbers representing WSI levels or MPP values for multi resolution patches.

To disk patch extraction pool

DiskPatchExtractorPool is a class for parallel extraction of patches which will be saved to disk.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will save extracted patches from multiple WSIs into /tmp directory:

from dplabtools.slides.patches import DiskPatchExtractorPool

extractor_pool = DiskPatchExtractorPool(
    patches_pool=patches_pool,
    output_dir="/tmp",
    image_type="png",
    thread_num_workers=2,
    proc_num_workers=3,
)

Class details

class dplabtools.slides.patches.DiskPatchExtractorPool(...)

Extractor pool implementation for DiskPatchExtractor.

property manifest_ids

Return the IDs of the created manifests.

property patch_count

Return the number of extracted patches.

property pids

Return the IDs of the executed processes.

To disk patch extraction pool (MRP)

MultiResDiskPatchExtractorPool is a class for parallel extraction of multi resolution patches which will be saved to disk.

Basic usage

Assuming that patches_pool represents an object from one of the Patch locations/sampling pool classes classes, the following example will save sets of extracted patches from multiple WSIs into the /tmp directory:

from dplabtools.slides.patches import MultiResDiskPatchExtractorPool

extractor_pool = DiskPatchExtractorPool(
    patches_pool=patches_pool,
    levels_or_mpps=[0, 1],
    output_dir="/tmp",
    image_type="png",
    thread_num_workers=2,
    proc_num_workers=3,
)

Class details

class dplabtools.slides.patches.MultiResDiskPatchExtractorPool(...)

Extractor pool implementation for MultiResDiskPatchExtractor.

property manifest_ids

Return the IDs of the created manifests.

property patch_count

Return the number of extracted patches.

property patchset_count

Return the number of patch sets created during the extraction.

property pids

Return the IDs of the executed processes.

Parameters specific to MultiResMemPatchExtractorPool:

class dplabtoolshiddenclass_b51da6e658804e11b08f7457174e4664
Parameters:
  • levels_or_mpps (list of level_or_mpp values) – Int or Float numbers representing WSI levels or MPP values for multi resolution patches.

  • global_counter (int, default=1) – Initial counter value for enumerating patch set directories (set1, set2, set3, …) for an entire collection of WSIs. Setting this value to None will cause the patch set counter to be reset for each WSI (wsi1_set1, wsi1_set2, … wsi2_set1, wsi2_set2, …).

Parameters common to disk patch extraction pool classes

class dplabtoolshiddenclass_8c6f157e09f14815bf678c24f13de596
Parameters:
  • output_dir (str) – Directory name or path for saving the extracted patches.

  • image_type (str) – Image type of the saved files (PNG, JPG, etc.).

  • filename_comment (str, optional) – Comment to be added to the saved file names.

  • filename_separator (str, default="_") – Separator used in the saved file names.

  • create_subdirs (bool, default=False) – Whether to create label specific subdirectories inside output_dir or not.

Parameters common to all patch extraction pool classes

class dplabtoolshiddenclass_eb5261159cbe4c4ab0f062d5053d5b4a
Parameters:
  • patches_pool (object) – Object representing one of the patch location pool classes.

  • proc_num_workers (int) – Number of processes in the pool. This value corresponds directly to the number of WSIs to be processed simultaneously.

  • thread_num_workers (int) – Number of threads per one worker process. This value indicates how many threads will be used to extract patches from a single WSI.

  • proc_mp_chunksize (int, default=1) – Data chunk size used in process parallelization processing.

  • thread_mp_chunksize (int, default=1) – Data chunk size used in thread parallelization processing.

  • resampling_mode (str, optional) – One of two supported down/up-sampling methods: wsi or tile

  • included_labels (list of str, optional) – Polygon labels included in patch extraction, all other labels will be ignored.

  • excluded_labels (list of str, optional) – Polygon labels excluded from patch extraction.