Patch Extraction

dplaptools provides a set of patch extraction classes, which integrate with the Patch locations/sampling classes. Extracted patches can be saved to disk or stored in memory.

Other class features include:

  • Parallel patch extraction from a single WSI using multi-threading.

  • Support for multi resolution patches (MRP).

  • Patch filtering using arbitrary labels.

  • For patches saved to disk: automated manifest files creation and built-in file count checks.

In memory patch extraction

MemPatchExtractor is a class designed to perform the extraction of patches which will reside in memory.

Basic usage

Assuming that patches represents an object of one of the Patch locations/sampling classes, the following example will create a stream of in-memory patches and will print them on the screen:

from dplabtools.slides.patches import MemPatchExtractor

extractor = MemPatchExtractor(
    patches=patches,
    num_workers=4,
)

for patch in extractor.patch_stream:
    print(patch)

Class details

class dplabtools.slides.patches.MemPatchExtractor(...)

Class for extracting in-memory patches.

property patch_count

Return the number of extracted patches.

property patch_data

Return the patch data used in the patch extraction process.

property patch_labels

Return the distinct patch labels used in the patch extraction process.

property patch_stream

Return a stream of memory images (an iterable object).

In memory patch extraction (MRP)

MultiResMemPatchExtractor is a class designed to perform the extraction of multi resolution patches which will reside in memory.

Basic usage

Assuming that patches represents an object of one of the Patch locations/sampling classes, the following example will create a stream of sets of in-memory patches and will print them on the screen:

from dplabtools.slides.patches import MultiResMemPatchExtractor

extractor = MultiResMemPatchExtractor(
    patches=patches,
    levels_or_mpps=[0, 1],
    num_workers=4,
)

for multires_patch in extractor.patch_stream:
    for patch in multires_patch:
        print(patch)

Class details

class dplabtools.slides.patches.MultiResMemPatchExtractor(...)

Class for extracting in-memory multi resolution patches.

property patch_count

Return the number of extracted patches.

property patch_data

Return the patch data used in the patch extraction process.

property patch_labels

Return the distinct patch labels used in the patch extraction process.

property patch_stream

Return a stream of memory images (an iterable object).

Parameters specific to MultiResMemPatchExtractor:

class dplabtoolshiddenclass_71c7d5446dd044f19f6147f06caf795f
Parameters:

levels_or_mpps (list of level_or_mpp values) – Int or Float numbers representing WSI levels or MPP values for multi resolution patches.

See also

level_or_mpp

To disk patch extraction

DiskPatchExtractor is a class designed to perform the extraction of patches which will be saved to disk.

Basic usage

Assuming that patches represents an object of one of the Patch locations/sampling classes, the following example will save extracted patches into the /tmp directory:

from dplabtools.slides.patches import DiskPatchExtractor

extractor = DiskPatchExtractor(
    patches=patches,
    output_dir="/tmp",
    image_type="png",
    num_workers=4,
)

Class details

class dplabtools.slides.patches.DiskPatchExtractor(...)

Class for extracting patches to disk.

property manifest_id

Return the current manifest ID.

property patch_count

Return the number of extracted patches.

property patch_data

Return the patch data used in the patch extraction process.

property patch_labels

Return the distinct patch labels used in the patch extraction process.

To disk patch extraction (MRP)

MultiResDiskPatchExtractor is a class designed to perform the extraction of multi resolution patches which will be saved to disk.

Note

Each set of patches will be saved into a dedicated subdirectory.

Basic usage

Assuming that patches represents an object of one of the Patch locations/sampling classes, the following example will save sets of extracted patches into the /tmp directory:

from dplabtools.slides.patches import MultiResDiskPatchExtractor

extractor = MultiResDiskPatchExtractor(
    patches=patches,
    levels_or_mpps=[0, 1],
    output_dir="/tmp",
    image_type="png",
    num_workers=4,
)

Class details

class dplabtools.slides.patches.MultiResDiskPatchExtractor(...)

Class for extracting multi resolution patches to disk.

property manifest_id

Return the current manifest ID.

property patch_count

Return the number of extracted patches.

property patch_data

Return the patch data used in the patch extraction process.

property patch_labels

Return the distinct patch labels used in the patch extraction process.

property patchset_counter

Return the last used value for the global patch counter.

Parameters specific to MultiResDiskPatchExtractor:

class dplabtoolshiddenclass_d5b397141cd24b3bbb9e65769eaf4bf3
Parameters:
  • levels_or_mpps (list of level_or_mpp values) – Int or Float numbers representing WSI levels or MPP values for multi resolution patches.

  • global_counter (int, default=1) – Initial counter value for enumerating patch set directories (set1, set2, set3, …) for an entire collection of WSIs. Setting this value to None will cause the patch set counter to be reset for each WSI (wsi1_set1, wsi1_set2, … wsi2_set1, wsi2_set2, …).

Parameters common to disk patch extraction classes

class dplabtoolshiddenclass_8f5d0ae734ba46abb985853294d8abde
Parameters:
  • output_dir (str) – Directory name or path for saving the extracted patches.

  • image_type (str) – Image type of the saved files (PNG, JPG, etc.).

  • filename_comment (str, optional) – Comment to be added to the saved file names.

  • filename_separator (str, default="_") – Separator used in the saved file names.

  • create_subdirs (bool, default=False) – Whether to create label specific subdirectories inside output_dir or not.

  • pool_mode (bool, dafault=False) – Internal flag for integration with the pool classes, not to be set by the user.

Parameters common to all patch extraction classes

class dplabtoolshiddenclass_2012409bd2e5441687e4205b4ff12042
Parameters:
  • patches (object) – Object representing one of the patch location classes.

  • num_workers (int) – Number of thread workers in parallel processing.

  • mp_chunksize (int, default=1) – Data chunk size used in parallel processing.

  • resampling_mode (str, optional) – One of two supported down/up-sampling methods: wsi or tile

  • included_labels (list of str, optional) – Polygon labels included in patch extraction, all other labels will be ignored.

  • excluded_labels (list of str, optional) – Polygon labels excluded from patch extraction.