diff --git a/docs/usage/notebooks.md b/docs/usage/notebooks.md index dd344875516..091ef6aad8c 100644 --- a/docs/usage/notebooks.md +++ b/docs/usage/notebooks.md @@ -5,12 +5,11 @@ in a more interactive manner, calling individual steps. This is useful as it all and with rapid feedback the parameters that may need adjusting in order to process a batch of scans. The notebooks can be found in the `notebook/` directory after cloning the GitHub repository. -| Notebook | Description | -| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | -| `00-Walkthrough-minicircle.ipynb` | Step-by-step walkthrough of processing `minicircle.spm` from the `tests/resources/` directory. | -| `01-Walthrhgouh-interactive.ipynb` | **Work in Progress** As above but uploading a single scan. Will be deployed in Google Colab/Binder for interactive use. | -| `02-Summary-statistics-and-plots.ipynb` | Plotting statistics interactively. | -| `03-Plotting-scans.ipynb` | Plotting NumPy arrays of scans from different stages of processing. | +| Notebook | Description | +| --------------------------------------- | ---------------------------------------------------------------------------------------------- | +| `01-Walkthrough-minicircle.ipynb` | Step-by-step walkthrough of processing `minicircle.spm` from the `tests/resources/` directory. | +| `02-Summary-statistics-and-plots.ipynb` | Plotting statistics interactively. | +| `03-Plotting-scans.ipynb` | Plotting NumPy arrays of scans from different stages of processing. | ## Installation diff --git a/notebooks/01-Walkthrough-minicircle.ipynb b/notebooks/01-Walkthrough-minicircle.ipynb new file mode 100644 index 00000000000..e95665f7ab4 --- /dev/null +++ b/notebooks/01-Walkthrough-minicircle.ipynb @@ -0,0 +1,839 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": {}, + "source": [ + "# TopoStats - Minicircle Walk-Through\n", + "\n", + "Welcome, this [Jupyter Notebook](https://jupyter.org/) will take you through processing `minicircle.spm` - a nanoscale AFM height image of DNA atop a flat mica surface." + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": {}, + "source": [ + "## Installing TopoStats\n", + "\n", + "There are several different ways of installing TopoStats depending on what you want to do. The simplest is to install\n", + "from GitHub under a virtual environment.\n", + "\n", + "```bash\n", + "pip install git+https://github.com/AFM-SPM/TopoStats.git@main\n", + "```\n", + "\n", + "For more information on the different ways of installing TopoStats and setting up Virtual Environments please refer to\n", + "[installation](https://afm-spm.github.io/TopoStats/installation.html).\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": {}, + "source": [ + "## Getting Started\n" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "### Loading Libraries and Modules\n", + "\n", + "TopoStats is written as a series of modules with various classes and functions. In order to use these interactively we\n", + "need to `import` them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": {}, + "outputs": [], + "source": [ + "import copy\n", + "import json\n", + "from pathlib import Path\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from IPython.display import display\n", + "from PIL import Image\n", + "\n", + "from topostats.filters import Filters\n", + "from topostats.grains import Grains\n", + "from topostats.grainstats import GrainStats\n", + "from topostats.io import LoadScans, find_files, read_yaml\n", + "from topostats.plottingfuncs import Images\n", + "from topostats.processing import get_out_paths, run_grains\n", + "from topostats.tracing.disordered_tracing import trace_image_disordered" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": {}, + "source": [ + "## Finding Files\n", + "\n", + "When run from the command line TopoStats needs to find files to process and the `find_files()` function helps here. It\n", + "takes as an argument the directory path that should be searched and the file extension to look for (this example uses `.spm`\n", + "files) and returns a list of all files in the specified directory which have that file extension\n", + "directory. We can use that functionality in this Notebook if you place your files in the same directory as these\n", + "Notebooks and execute the next cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [], + "source": [ + "# Set the base directory to be current working directory of the Notebook\n", + "BASE_DIR = Path().cwd()\n", + "# Alternatively if you know where your files area comment the above line and uncomment the below adjust it for your use.\n", + "# BASE_DIR = Path(\"/path/to/where/my/files/are\")\n", + "# Adjust the file extension appropriately.\n", + "FILE_EXT = \".spm\"\n", + "# Search for *.spm files one directory level up from the current notebooks\n", + "image_files = find_files(base_dir=BASE_DIR.parent / \"tests\" / \"resources\", file_ext=FILE_EXT)" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": {}, + "source": [ + "`image_files` is a list of images that match and we can look at that list." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": {}, + "outputs": [], + "source": [ + "image_files" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9", + "metadata": {}, + "source": [ + "## Loading a Configuration\n", + "\n", + "You can specify all options explicitly by hand when instantiating classes or calling methods/functions. However when run\n", + "at the command line in batch mode TopoStats loads these options from a [YAML](https://yaml.org/) configuration file and it is worth\n", + "understanding the structure of this file and how it is used.\n", + "\n", + "A trimmed version is shown below. The words that come before the colon `:` are the option, e.g. `base_dir:` is the base\n", + "directory that is searched for files, what comes after is the value, in this case `./tests/`\n", + "\n", + "\n", + "```yaml\n", + "base_dir: ./ # Directory in which to search for data files\n", + "output_dir: ./output # Directory to output results to\n", + "log_level: info # Verbosity of output. Options: warning, error, info, debug\n", + "cores: 2 # Number of CPU cores to utilise for processing multiple files simultaneously.\n", + "file_ext: .spm # File extension of the data files.\n", + "loading:\n", + " channel: Height # Channel to pull data from in the data files.\n", + "filter:\n", + " run: true # Options : true, false\n", + " row_alignment_quantile: 0.5 # below values may improve flattening of larger features\n", + " threshold_method: std_dev # Options : otsu, std_dev, absolute\n", + " otsu_threshold_multiplier: 1.0\n", + " threshold_std_dev:\n", + " below: 10.0 # Threshold for data below the image background\n", + " above: 1.0 # Threshold for data above the image background\n", + " threshold_absolute:\n", + " below: -1.0 # Threshold for data below the image background\n", + " above: 1.0 # Threshold for data above the image background\n", + " gaussian_size: 1.0121397464510862 # Gaussian blur intensity in px\n", + " gaussian_mode: nearest\n", + " # Scar remvoal parameters. Be careful with editing these as making the algorithm too sensitive may\n", + " # result in ruining legitimate data.\n", + " remove_scars:\n", + " run: true\n", + " removal_iterations: 2 # Number of times to run scar removal.\n", + " threshold_low: 0.250 # below values make scar removal more sensitive\n", + " threshold_high: 0.666 # below values make scar removal more sensitive\n", + " max_scar_width: 4 # Maximum thichness of scars in pixels.\n", + " min_scar_length: 16 # Minimum length of scars in pixels.\n", + "grains:\n", + " run: true # Options : true, false\n", + " # Thresholding by height\n", + " threshold_method: std_dev # Options : std_dev, otsu, absolute\n", + " otsu_threshold_multiplier: 1.0\n", + " threshold_std_dev:\n", + " below: 10.0 # Threshold for grains below the image background\n", + " above: 1.0 # Threshold for grains above the image background\n", + " threshold_absolute:\n", + " below: -1.0 # Threshold for grains below the image background\n", + " above: 1.0 # Threshold for grains above the image background\n", + " # Thresholding by area\n", + " smallest_grain_size_nm2: 50 # Size in nm^2 of tiny grains/blobs (noise) to remove, must be > 0.0\n", + " absolute_area_threshold:\n", + " above: [ 300, 3000 ] # above surface [Low, High] in nm^2 (also takes null)\n", + " below: [ null, null ] # below surface [Low, High] in nm^2 (also takes null)\n", + "grainstats:\n", + " run: true # Options : true, false\n", + " edge_detection_method: binary_erosion # Options: canny, binary erosion. Do not change this unless you are sure of what this will do.\n", + " cropped_size: 40.0 # Length (in nm) of square cropped images (can take -1 for grain-sized box)\n", + "disordered_tracing:\n", + " run: true # Options : true, false\n", + " min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.\n", + " skeletonisation_method: topostats # Options : zhang | lee | thin | topostats\n", + " pad_width: 1 # Cells to pad grains by when tracing\n", + "# cores: 1 # Number of cores to use for parallel processing\n", + "plotting:\n", + " run: true # Options : true, false\n", + " save_format: png # Options : see https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html\n", + " pixel_interpolation: null # Options : https://matplotlib.org/stable/gallery/images_contours_and_fields/interpolation_methods.html\n", + " image_set: core # Options : all, core\n", + " zrange: [null, null] # low and high height range for core images (can take [null, null]). low <= high\n", + " colorbar: true # Options : true, false\n", + " axes: true # Options : true, false (due to off being a bool when parsed)\n", + " cmap: nanoscope # Options : nanoscope, afmhot, gwyddion\n", + " mask_cmap: blu # Options : blu, jet_r and any in matplotlib\n", + " histogram_log_axis: false # Options : true, false\n", + " histogram_bins: 200 # Number of bins for histogram plots to use\n", + " dpi: 100 # Dots Per Inch used in figures, if set to \"figure\" will use Matplotlib default\n", + "summary_stats:\n", + " run: true # Whether to make summary plots for output data\n", + " config: null\n", + "```\n", + "\n", + "To load the configuration file into Python we use the `read_yaml()` function. This saves the options as a dictionary and\n", + "we can access values by the keys. The example below prints out the top-levels keys and then the keys for the `filter`\n", + "configuration.\n", + "\n", + "There is also a separate configuration file called `plotting_dictionary` which contains parameters for which data to plot and in what format.\n", + "\n", + "**NB** Python dictionaries have keys which can be considered as the parameter that is to be configured and each key has\n", + "an associated value which is the value you wish to set the parameter to." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [], + "source": [ + "config = read_yaml(BASE_DIR.parent / \"topostats\" / \"default_config.yaml\")\n", + "plotting_dictionary = read_yaml(BASE_DIR.parent / \"topostats\" / \"plotting_dictionary.yaml\")\n", + "print(f\"Top level keys of config.yaml : \\n\\n {config.keys()}\\n\")\n", + "print(f\"Configuration options for Filter : \\n\\n {config['filter'].keys()}\")" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": {}, + "source": [ + "You can look at all of the options using the `json` package to \"pretty\" print the dictionary which makes it easier to\n", + "read. Here we print the `filter` section. You can see the options map to those of the `Filter()` class with an\n", + "additional `\"run\": true` which is used when running TopoStats at the command line." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": {}, + "outputs": [], + "source": [ + "print(json.dumps(config[\"filter\"], indent=4))" + ] + }, + { + "cell_type": "markdown", + "id": "13", + "metadata": {}, + "source": [ + "We will use the configuration options we have loaded in processing the `minicircle.spm` image. For convenience we save\n", + "each set of options to their own dictionary and remove the `run` entry as this is not required when running TopoStats\n", + "interactively.\n", + "\n", + "We also set the `plotting_config[\"image_set\"]` to `all` so that all images can be plotted (there are some internal controls that determine whether images are plotted and returned).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": {}, + "outputs": [], + "source": [ + "loading_config = config[\"loading\"]\n", + "filter_config = config[\"filter\"]\n", + "filter_config.pop(\"run\")\n", + "grain_config = config[\"grains\"]\n", + "grainstats_config = config[\"grainstats\"]\n", + "grainstats_config.pop(\"run\")\n", + "disordered_config = config[\"disordered_tracing\"]\n", + "disordered_config.pop(\"run\")\n", + "plotting_config = config[\"plotting\"]\n", + "plotting_config.pop(\"run\")\n", + "plotting_config[\"image_set\"] = \"all\"" + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": {}, + "source": [ + "## Load Scans\n", + "\n", + "The first step before processing is to load a scan, this extracts the image data to a Numpy array along with the\n", + "filepath and the pixel to nanometer scaling parameter which is used to correctly scale the pixels to images. These are\n", + "stored in nested dictionaries with one top-level entry for each image that is found.\n", + "\n", + "One of the key fields you may wish to change is the `channel`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": {}, + "outputs": [], + "source": [ + "all_scan_data = LoadScans(image_files, config=config)\n", + "all_scan_data.get_data()\n", + "\n", + "# Plot the loaded scan in its raw format\n", + "fig, ax = plt.subplots(figsize=(8, 8))\n", + "plt.imshow(all_scan_data.image, cmap=\"afmhot\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "17", + "metadata": {}, + "source": [ + "Now that we have loaded the data we can start to process it. The first step is filtering the image." + ] + }, + { + "cell_type": "markdown", + "id": "18", + "metadata": {}, + "source": [ + "## TopoStats Object\n", + "\n", + "We create a `topostats_object` to assign an image's data to. All data and stats collected on the image after this point will also be stored in here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19", + "metadata": {}, + "outputs": [], + "source": [ + "topostats_object = all_scan_data.img_dict[\"minicircle.spm\"]" + ] + }, + { + "cell_type": "markdown", + "id": "20", + "metadata": {}, + "source": [ + "## Get Output Paths\n", + "\n", + "To ensure consistency throughout the program output paths are defined once and passed through the different processes.\n", + "\n", + "The `topostats_object` is also defined here which holds all the data related to a specific image and the statistics collected from it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21", + "metadata": {}, + "outputs": [], + "source": [ + "core_out_path, filter_out_path, grain_out_path, tracing_out_path = get_out_paths(\n", + " image_path=topostats_object.img_path,\n", + " base_dir=all_scan_data.config[\"base_dir\"],\n", + " output_dir=all_scan_data.config[\"output_dir\"],\n", + " filename=topostats_object.filename,\n", + " plotting_config=plotting_config,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "22", + "metadata": {}, + "source": [ + "## Filter Image\n", + "\n", + "Now that we have found some images the first step in processing is to filter them to remove some of the noise. This is\n", + "achieved using the `Filters()` class. There are a number of options that we need to specify which are described in the\n", + "table below and also in the [documentation](https://topostats.readthedocs.io/en/dev/topostats.filters.html). \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Once we setup a `Filters` object we can call the different methods that are available for it. There are lots of\n", + "different methods that carry out the different steps but for convenience the `filter_image()` method runs all these.\n", + "\n", + "The following section instantiates (\"sets up\") an object called `filtered_image` of type `Filters` using the first file\n", + "found (`image_files[0]`) and the various options from the `filter_config` dictionary.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": {}, + "outputs": [], + "source": [ + "filtered_image = Filters(topostats_object=topostats_object, **filter_config)\n", + "filtered_image.filter_image()" + ] + }, + { + "cell_type": "markdown", + "id": "24", + "metadata": {}, + "source": [ + "The `filtered_image` now has a number of NumPy arrays saved in the `.images` dictionary that can be accessed and plotted. To view\n", + "the names of the images (technically the dictionary keys) you can print them with `filter_image.images.keys()`..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"Available NumPy arrays to plot in filter_image.images dictionary :\\n\\n{filtered_image.images.keys()}\")" + ] + }, + { + "cell_type": "markdown", + "id": "26", + "metadata": {}, + "source": [ + "To plot the raw extracted pixels you can use the built-in NumPy method `imshow()`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(8, 8))\n", + "plt.imshow(filtered_image.images[\"gaussian_filtered\"], cmap=\"afmhot\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "28", + "metadata": {}, + "source": [ + "TopoStats includes a custom plotting class `Images` which formats plots in a more familiar manner.\n", + "\n", + "It has a number of options, please refer to the official documentation on\n", + "[configuration](https://afm-spm.github.io/TopoStats/configuration.html) under the `plotting` entry for what these values\n", + "are or the [API\n", + "reference](https://afm-spm.github.io/TopoStats/topostats.plottingfuncs.html#module-topostats.plottingfuncs).\n", + "\n", + "The class requires a Numpy array, which we have just generated many of during the various filtering stages, and a number\n", + "of options. Again for convenience we use the `**plotting_config` notation to unpack the key/value pairs stored in the\n", + "`plotting_config` dictionary.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = Images(\n", + " data=filtered_image.images[\"gaussian_filtered\"],\n", + " filename=filtered_image.filename,\n", + " output_dir=\"img/\",\n", + " save=True,\n", + " **plotting_config,\n", + ").plot_and_save()\n", + "display(fig)" + ] + }, + { + "cell_type": "markdown", + "id": "30", + "metadata": {}, + "source": [ + "Here we plot the image after processing and zero-averaging the background but with the `viridis` palette and\n", + "constraining the `zrange` to be between 0 and 3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": {}, + "outputs": [], + "source": [ + "# First remove the current value for cmap in the plotting_config dictionary, otherwise an error occurs because the same\n", + "# argument will have been specified twice.\n", + "current_cmap = plotting_config.pop(\"cmap\")\n", + "current_zrange = plotting_config.pop(\"zrange\")\n", + "fig, ax = Images(\n", + " data=filtered_image.images[\"gaussian_filtered\"],\n", + " filename=filtered_image.filename,\n", + " output_dir=\"img/\",\n", + " cmap=\"viridis\",\n", + " zrange=[0, 3],\n", + " save=True,\n", + " **plotting_config,\n", + ").plot_and_save()\n", + "# Restore the value for cmap to the dictionary.\n", + "plotting_config[\"cmap\"] = current_cmap\n", + "plotting_config[\"zrange\"] = current_zrange\n", + "fig" + ] + }, + { + "cell_type": "markdown", + "id": "32", + "metadata": {}, + "source": [ + "## Finding Grains\n", + "\n", + "The next step in processing the image is to find grains - a.k.a the molecules we want to analyse. This is done using the `Grains` class and we have saved the\n", + "configuration to the `grains_config` dictionary. For details of the arguments and their values please refer to the\n", + "[configuration](https://afm-spm.github.io/TopoStats/configuration.html) and the [API\n", + "reference](https://afm-spm.github.io/TopoStats/topostats.grains.html).\n", + "\n", + "The most important thing required for grain finding is the resulting image from the Filtering stage, however several\n", + "other key variables are required. Again there is a one-to-one mapping between the options to the `Grains()` class and\n", + "their values in the configuration file.\n", + "\n", + "As with `Filters` the `Grains` class has a number of methods that carry out the grain finding, but there is a convenience method\n", + "`find_grains()` which calls all these in the correct order." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33", + "metadata": {}, + "outputs": [], + "source": [ + "plotting_config[\"run\"] = True\n", + "plotting_config[\"plot_dict\"] = plotting_dictionary\n", + "\n", + "run_grains(\n", + " topostats_object=topostats_object,\n", + " grain_out_path=grain_out_path,\n", + " core_out_path=core_out_path,\n", + " plotting_config=plotting_config,\n", + " grains_config=grain_config,\n", + ")\n", + "\n", + "plotting_config.pop(\"run\")\n", + "\n", + "grains = Grains(topostats_object=topostats_object, **grain_config)\n", + "grains.find_grains()\n", + "\n", + "plotting_config.pop(\"plot_dict\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "34", + "metadata": {}, + "source": [ + "The `grains` object now also contains a series of images that we can plot." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"Available NumPy arrays to plot in grains:\\n\\n{len(topostats_object.grain_crops)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "36", + "metadata": {}, + "source": [ + "And we can again use the `plot_and_save()` function to plot these." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37", + "metadata": {}, + "outputs": [], + "source": [ + "plotting_config[\"colorbar\"] = False\n", + "fig, ax = Images(\n", + " data=grains.image,\n", + " filename=filtered_image.filename,\n", + " output_dir=\"img/\",\n", + " save=True,\n", + " **plotting_config,\n", + ").plot_and_save()\n", + "fig" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "38", + "metadata": {}, + "source": [ + "### Thresholds\n", + "\n", + "The thresholds can be used in different ways based on the direction you want to detect grains. Typically for molecular\n", + "imaging where the DNA or protein is raised above the background you will want to look for objects above the surface, using the `above`\n", + "threshold. However, when imaging silicon, you may be interested in objects below the surface, using the `below` threshold. For convenience it is\n", + "possible to look for grains that are both above the `above` threshold and `below` the below threshold.\n", + "\n", + "If you want to change the option you can update the `config[\"grains\"]` dictionary as we do below.\n", + "\n", + "So far the thresholding method used has been `threshold_method=\"std_dev\"` defined in the configuration file we\n", + "loaded. This calculates the mean and standard deviation of height across the whole image and then determines the\n", + "threshold by scaling the standard deviation by a given factor (defined by `threshold_std_dev`) and adding it to the mean\n", + "to give the `above` threshold and/or subtracting if from the mean to give the `below` threshold.\n", + "\n", + "An alternative method is to use the `threshold_method=\"absolute\"` and explicitly state the\n", + "`below` and `above` thresholds (although since we are only looking for objects above a given threshold only the `above`\n", + "value will be used). If you wish to specify values for both they are shown below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39", + "metadata": {}, + "outputs": [], + "source": [ + "plotting_config[\"run\"] = True\n", + "plotting_config[\"plot_dict\"] = plotting_dictionary\n", + "grain_config[\"run\"] = True\n", + "\n", + "temp_topostats_object = copy.deepcopy(topostats_object)\n", + "\n", + "grain_config[\"threshold_method\"] = \"absolute\"\n", + "grain_config[\"threshold_absolute\"][\"above\"] = 0.01 # Change just the above threshold\n", + "grain_config[\"threshold_absolute\"][\"below\"] = [] # Change just the below threshold\n", + "grain_config[\"threshold_absolute\"] = {\n", + " \"below\": [],\n", + " \"above\": 1.2,\n", + "} # Change both the below and above threshold\n", + "\n", + "temp_topostats_object.image = filtered_image.images[\"final_zero_average_background\"]\n", + "temp_topostats_object.filename = filtered_image.filename\n", + "temp_topostats_object.pixel_to_nm_scaling = filtered_image.pixel_to_nm_scaling\n", + "\n", + "\n", + "run_grains(\n", + " topostats_object=temp_topostats_object,\n", + " grain_out_path=grain_out_path,\n", + " core_out_path=core_out_path,\n", + " plotting_config=plotting_config,\n", + " grains_config=grain_config,\n", + ")\n", + "\n", + "grains_absolute = Grains(topostats_object=temp_topostats_object, **grain_config)\n", + "grains_absolute.find_grains()" + ] + }, + { + "cell_type": "markdown", + "id": "40", + "metadata": {}, + "source": [ + "This is important because you need to know where the resulting images are stored within the `Grains.direction`\n", + "dictionary. This will have entries corresponding to the `direction` that grains have been searched for." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"Grains available in original 'grains' (std_dev, both) : {len(grains.grain_crops)}\")\n", + "print(f\"Grains available in absolute (absolute, above) : {len(grains_absolute.grain_crops)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "42", + "metadata": {}, + "source": [ + "## Grain Statistics\n", + "\n", + "Now that the grains have been found we can calculate statistics for each. This is done using the `GrainStats()`\n", + "class. Again the configuration options from the YAML file map to those of the class and there is a convenience method\n", + "`calculate_stats()` which will run all steps of grain finding. However, because the class is processing results that we\n", + "have generated we have to explicitly pass in more values.\n", + "\n", + "For details of what the arguments are please refer to the [API\n", + "reference](https://afm-spm.github.io/TopoStats/topostats.grainstats.html).\n", + "\n", + "The `GrainStats` class returns a Pandas `pd.DataFrame` of calculated statistics We therefore instantiate (\"set-up\") the `grain_stats` dictionary\n", + "to hold these results.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43", + "metadata": {}, + "outputs": [], + "source": [ + "grainstats = GrainStats(topostats_object=topostats_object, base_output_dir=\"grains\")\n", + "grainstats.calculate_stats()\n", + "grain_stats = {crop_id: crop.stats for crop_id, crop in grainstats.grain_crops.items()}" + ] + }, + { + "cell_type": "markdown", + "id": "44", + "metadata": {}, + "source": [ + "`grain_stats` is a dictionary. We can print this out as\n", + "shown below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45", + "metadata": {}, + "outputs": [], + "source": [ + "grain_stats" + ] + }, + { + "cell_type": "markdown", + "id": "46", + "metadata": {}, + "source": [ + "### Plotting Individual Grains\n", + "\n", + "It is possible to plot the individual grains in the same way that whole images are plotted. These are created when running `Grains()` earlier in this notebook and are stored in `[output_dir]/processed/[filename]/` after a successful run of TopoStats. Different types of plots can be found in this directory, here we will access an ordered trace of a single grain (stored in `[output_dir]/processed/[filename]/dnatracing/ordered/`).\n", + "\n", + "The naming convention for the saved `.png` files is `[filename]_grain_[grain_id]_ordered_traces`, here we will view the first grain (id 0). We use an `f string` to mark a space to automatically replace with the image's filename within a regular string. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47", + "metadata": {}, + "outputs": [], + "source": [ + "img_file = (\n", + " Path(BASE_DIR.parent)\n", + " / config[\"output_dir\"]\n", + " / \"processed\"\n", + " / topostats_object.filename\n", + " / \"dnatracing\"\n", + " / \"ordered\"\n", + " / f\"{topostats_object.filename}_grain_0_ordered_traces.png\"\n", + ")\n", + "ordered_trace_img = Image.open(img_file)\n", + "fig, ax = plt.subplots()\n", + "ax.imshow(ordered_trace_img)\n", + "ax.axis(\"off\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "48", + "metadata": {}, + "source": [ + "## DNA Tracing\n", + "\n", + "When working with molecules it is possible to calculate DNA Tracing Statistics using the `disordered_tracing.py`'s `disorderedTrace.trace_dna()` function which takes an image and grain masks, and returns statistics about the dna.\n", + "\n", + "Here we select a single grain and print the statistics for it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49", + "metadata": {}, + "outputs": [], + "source": [ + "trace_image_disordered(\n", + " topostats_object=grains.topostats_object,\n", + " **disordered_config,\n", + ")\n", + "\n", + "tracing_results = topostats_object.grain_crops[2].disordered_trace.stats\n", + "\n", + "print(tracing_results)" + ] + }, + { + "cell_type": "markdown", + "id": "50", + "metadata": {}, + "source": [ + "These statistics can now be plotted to show the distribution of the different metrics. Please see the Jupyter Notebook\n", + "`notebooks/02-Summary-statistics-and-plots.ipynb` for examples of how to plot these statistics." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + }, + "name": "00-Walkthrough-minicircle.ipynb" + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/02-Summary-statistics-and-plots.ipynb b/notebooks/02-Summary-statistics-and-plots.ipynb new file mode 100644 index 00000000000..0f64b20917b --- /dev/null +++ b/notebooks/02-Summary-statistics-and-plots.ipynb @@ -0,0 +1,428 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "0", + "metadata": {}, + "source": [ + "# Summarising and Plotting Statistics\n", + "\n", + "After a successful run of `topostats process` you will have a `molecule_statistics.csv` file that contains a summary of various\n", + "statistics about the detected molecules across all image files that were processed. There is a class\n", + "`topostats.plotting.TopoSum` that uses this file to generate plots automatically and a convenience command\n", + "`toposum` which provides an entry point to re-run the plotting at the command line.\n", + "\n", + "Inevitably though there will be a point where you want to tweak plots for publication or otherwise in some manner that\n", + "is not conducive to scripting in this manner because making every single option from\n", + "[Seaborn](https://seaborn.pydata.org/) and [Matplotlib](https://matplotlib.org/) accessible via this class is a\n", + "considerable amount of work writing [boilerplate code](https://en.wikipedia.org/wiki/Boilerplate_code). Instead the\n", + "plots should be generated and tweaked interactively a notebook. This Notebook serves as a sample showing how to use the\n", + "`TopoSum` class and some examples of creating plots directly using [Pandas](https://pandas.pydata.org/).\n", + "\n", + "If you are unfamiliar with these packages it is recommended that you read the documentation. It is worth bearing in mind\n", + "that both Pandas and Seaborn build on the basic functionality that Matplotlib provides, providing easier methods for\n", + "generating plots. If you are stuck doing something with either of these refer to Matplotlib for how to achieve what you\n", + "are trying to do.\n", + "\n", + "* [Pandas](https://pandas.pydata.org/docs/)\n", + "* [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html)\n", + "* [Chart visualization — pandas](https://pandas.pydata.org/docs/user_guide/visualization.html?highlight=plotting)\n", + "* [seaborn: statistical data visualization](https://seaborn.pydata.org/index.html)\n", + "* [An introduction to seaborn](https://seaborn.pydata.org/tutorial/introduction.html)\n", + "* [Matplotlib — Visualization with Python](https://matplotlib.org/)\n", + "* [Tutorials — Matplotlib](https://matplotlib.org/stable/tutorials/index)\n", + "\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1", + "metadata": {}, + "source": [ + "## Load Libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "## Load `molecule_statistics.csv`\n", + "\n", + "You need to load your data to be able to work with it, this is best achieved by importing it using\n", + "[Pandas](https://pandas.pydata.org/). Here we use the `tests/resources/minicircle_default_molecule_statistics.csv` that is\n", + "part of the TopoStats repository and load it into the object called `df` (short for \"Data Frame\"). You will need to\n", + "change this path to reflect your output. \n", + "\n", + "Because `molecule_number` is unique to the `image` and `grain_number` we set a multi-level index of these three" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.read_csv(\"../tests/resources/minicircle_default_molecule_statistics.csv\")\n", + "df.set_index([\"image\", \"grain_number\"], inplace=True)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "5", + "metadata": {}, + "source": [ + "## Data Manipulation\n", + "\n", + "Sometimes it is desirable to extract further information from the CSV, for example sub-folder names. Pandas is an\n", + "excellent tool for doing this, but it can be a bit overwhelming with working out where to start as there are so many\n", + "options. This section contains some simple recipes for manipulating the data." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "6", + "metadata": {}, + "source": [ + "### Splitting `basename`\n", + "\n", + "The `basename` variable contains the directory paths and at times it may be desirable to group distribution plots across\n", + "images based on the directory from which they originate. The specific directory name is part of the longer string\n", + "`basename` and so this needs splitting to access the directory components.\n", + "\n", + "**NB** The value for `pat` (the pattern on which the string is split) may vary depending on the operating system the\n", + "images were processed on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": {}, + "outputs": [], + "source": [ + "# Split and expand `basename` into a new dataframe\n", + "basename_components_df = df[\"basename\"].str.split(\"\\\\\", expand=True)\n", + "basename_components_df.head()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "8", + "metadata": {}, + "source": [ + "You can now select which elements of `basename_components_df` to merge back into the original `df`. To just include both\n", + "components of the split `basename` you would" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": {}, + "outputs": [], + "source": [ + "basename_components_df.columns = [\"basename1\", \"basename2\"]\n", + "\n", + "df = df.merge(basename_components_df, left_index=True, right_index=True)\n", + "df.head()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "10", + "metadata": {}, + "source": [ + "## Plotting with Pandas" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "11", + "metadata": {}, + "source": [ + "### Plotting Contour Lengths" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": {}, + "outputs": [], + "source": [ + "df[\"contour_length\"].plot.hist(figsize=(16, 9), bins=20, title=\"Contour Lengths\", alpha=0.5)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "13", + "metadata": {}, + "source": [ + "### Plotting End to End Distance of non-Circular grains\n", + "\n", + "Circular grains are excluded since their end-to-end length is 0.0." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": {}, + "outputs": [], + "source": [ + "df[df[\"circular\"] == False][\"end_to_end_distance\"].plot.hist( # noqa: E712\n", + " figsize=(16, 9), bins=20, title=\"End to End Distance\", alpha=0.5\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "15", + "metadata": {}, + "source": [ + "### Multiple Images\n", + "\n", + "Often you will have processed multiple images and you will want to plot the distributions of metrics for each image\n", + "separately.\n", + "\n", + "For this example we duplicate the data and append it, adjusting the values slightly" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": {}, + "outputs": [], + "source": [ + "def scale_df(df: pd.DataFrame, scale: float, image: str) -> pd.DataFrame:\n", + " \"\"\"Scale the numerical values of a data frame. Retains string variables and the index.\n", + "\n", + " Parameters\n", + " ----------\n", + " df: pd.DataFrame\n", + " Pandas Dataframe to scale.\n", + " scale: float\n", + " Factor by which to scale the data.\n", + " image: str\n", + " Name for new (dummy) image.\n", + "\n", + " Returns\n", + " -------\n", + " pd.DataFrame\n", + " Scaled data frame\n", + " \"\"\"\n", + " _df = df[df.select_dtypes(include=[\"number\"]).columns] * scale\n", + " _df[\"circular\"] = df[\"circular\"].values\n", + " _df[\"basename\"] = df[\"basename\"].values\n", + " _df.reset_index(inplace=True)\n", + " _df[\"image\"] = image\n", + " # _df.set_index(df.index.names, inplace=True)\n", + "\n", + " # _df = pd.concat([_df, df[[\"circular\", \"basename\"]]], axis=1)\n", + " _df.set_index([\"image\"], inplace=True)\n", + " return _df\n", + "\n", + "\n", + "smaller = scale_df(df, scale=0.4, image=\"smaller\")\n", + "larger = scale_df(df, scale=1.5, image=\"larger\")\n", + "original = scale_df(df, scale=1.0, image=\"original\")\n", + "df_three_images = pd.concat([smaller, original, larger])\n", + "df_three_images.head()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "17", + "metadata": {}, + "source": [ + "### Contour Length from Three Processed Images" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": {}, + "outputs": [], + "source": [ + "df_three_images[\"contour_length\"].groupby(level=\"image\").plot.hist(\n", + " figsize=(16, 9),\n", + " bins=20,\n", + " title=\"Contour Lengths\",\n", + " alpha=0.5,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "19", + "metadata": {}, + "source": [ + "The bin width in above figure varies for each \"image\" (`smaller`, `larger` and `original`). This is because each\n", + "image's data is plotted separately (but overlaid on the same graph) and determined dynamically from the range of the data\n", + "and is a known shortcoming of Pandas (see [ENH: groupby.hist bins don't match\n", + "#22222](https://github.com/pandas-dev/pandas/issues/22222). To get around this you can specify the number of `bins`\n", + "explicitly based on the range of _all_ observed data (i.e. `min` to `max`) using `np.linspace()` (from the NumPy\n", + "package) along with the number of bins across the _total_ space (bins)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": {}, + "outputs": [], + "source": [ + "min = df_three_images[\"contour_length\"].min()\n", + "max = df_three_images[\"contour_length\"].max()\n", + "bins = 20\n", + "df_three_images[\"contour_length\"].groupby(\"image\").plot.hist(\n", + " figsize=(16, 9),\n", + " bins=np.linspace(min, max, num=bins), # Sets the bin width based on total range\n", + " title=\"Contour Lengths\",\n", + " alpha=0.5,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "21", + "metadata": {}, + "source": [ + "### Ignoring Image\n", + "\n", + "It is possible to plot the distribution of summary statistics without regard to the image from which they are\n", + "derived. Simply omit the `.groupby(\"image\")` from the plotting command.\n", + "\n", + "We also manually set the `fontsize`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": {}, + "outputs": [], + "source": [ + "matplotlib.rcParams.update({\"font.size\": 20})\n", + "df_three_images[\"contour_length\"].plot.hist(figsize=(16, 9), bins=20, title=\"Contour Lengths\", alpha=0.5)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "23", + "metadata": {}, + "source": [ + "### Violin Plot of `min_curvature` using Seaborn\n", + "\n", + "Pandas does not have built-in support for Violin Plots so we switch to using Seaborn. Here the `fig` and `ax` objects\n", + "are created first and we use the `ax.text()` method to add a string (`text_str`) in a box to the plot." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": {}, + "outputs": [], + "source": [ + "# Reset dataframe index to make `image` readily available\n", + "df_three_images.reset_index(inplace=True)\n", + "fig, ax = plt.subplots(1, 1, figsize=(16, 9))\n", + "sns.violinplot(data=df_three_images, x=\"image\", y=\"curvature_min\", hue=\"image\", alpha=0.5)\n", + "plt.title(\"Minimum Curvature\")\n", + "plt.ylabel(\"Minimum Curvature / nm\")\n", + "# Define text for the string to go in a blue text box.\n", + "text_str = \"\\n\".join(\n", + " [\"Sodium Concentration : 0.001mM\", \"Scan Size : 200x200\", \"More useful information : :-)\"]\n", + ")\n", + "props = dict(boxstyle=\"round\", alpha=0.5)\n", + "ax.text(\n", + " 0.5,\n", + " 0.85,\n", + " text_str,\n", + " transform=ax.transAxes,\n", + " fontsize=12, # verticalalignment=\"top\",\n", + " horizontalalignment=\"center\",\n", + " bbox=props,\n", + ")\n", + "# Return the index\n", + "df_three_images.set_index([\"image\"], inplace=True)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "25", + "metadata": {}, + "source": [ + "### Joint Plot\n", + "[Joint Plots](https://seaborn.pydata.org/generated/seaborn.jointplot.html) showing the relationship between two variables can be plotted easily." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26", + "metadata": {}, + "outputs": [], + "source": [ + "df.columns\n", + "sns.jointplot(data=df, x=\"curvature_min\", y=\"curvature_max\", hue=\"circular\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + }, + "name": "02-Summary-statistics-and-plots.ipynb" + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/03-Plotting-scans.ipynb b/notebooks/03-Plotting-scans.ipynb new file mode 100644 index 00000000000..0cebde70a29 --- /dev/null +++ b/notebooks/03-Plotting-scans.ipynb @@ -0,0 +1,572 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": {}, + "source": [ + "# Plotting Scans\n", + "\n", + "This Notebook demonstrates how to plot cleaned scans that have been produced by `topostats process`. There are a large\n", + "number of options available when plotting, too many to cover in this Notebook, but the aim is to demonstrate some\n", + "basics...\n", + "\n", + "* Loading NumPy Arrays\n", + "* Plotting using the TopoStats `plot_and_save()` function.\n", + "* Selecting a subset of a scan and plotting that.\n", + "* Applying different colour maps.\n", + "* Adding custom headings and axis labels.\n", + "* Saving images in a range of publication quality formats.\n", + "\n", + "The [NumPy](https://numpy.org/) arrays are plotted using [Matplotlib](https://matplotlib.org/) which has excellent\n", + "documentation. If you want to learn more then the [Tutorials and\n", + "Examples](https://matplotlib.org/stable/users/index.html#tutorials-and-examples) are a good place to start learning from." + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": {}, + "source": [ + "# Setup\n", + "\n", + "The first step required is to import some Python libraries to load and plot the data. You should run this Notebook\n", + "within a Conda/Virtual Environment into which you have installed TopoStats, ideally with the necessary Notebook\n", + "extensions. The following command will install TopoStats from [PyPI](https://pypi.org/project/topostats/) with the\n", + "requirements for running Notebooks.\n", + "\n", + "```python\n", + "pip install topostats[notebooks]\n", + "```\n", + "\n", + "You should have successfully processed images using `topostats process` at least once, this will have saved processed scans\n", + "to disk that we will load." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2", + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "import matplotlib.colors as mcolors\n", + "import matplotlib.pyplot as plt\n", + "from ipywidgets import widgets\n", + "\n", + "from topostats.io import LoadScans, read_yaml\n", + "from topostats.plottingfuncs import Images\n", + "from topostats.theme import Colormap\n", + "\n", + "\n", + "def on_file_upload(change):\n", + " # Get the uploaded file contents as a bytes object\n", + " print(f\"change['owner'].value : {change['owner'].value}\")\n", + " uploaded_file = change[\"owner\"].value\n", + " filename = list(uploaded_file.keys())[0]\n", + " # print(uploaded_file.items())\n", + " print(f\"filename : {filename}\")\n", + " print(f\"uploaded_file[filename]['metadata'] : {uploaded_file[filename]['metadata']}\")\n", + " # print(f\"uploaded_file : {str(uploaded_file.keys()[0])}\")\n", + " content = uploaded_file[filename][\"content\"] # noqa: F841\n", + "\n", + "\n", + "upload_button = widgets.FileUpload(accept=\".npy\", multiple=False)\n", + "display(upload_button) # noqa: F821\n", + "# select_file_upload.observe(on_file_upload, names=\"value\")" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "# Load\n", + "\n", + "Before we can plot data we need to load the data. You need to know where this file is located and this will depend on\n", + "the configuration you used when using `topostats process`. It will be located in the `processed` directory of your output\n", + "(but remember that it reflects the directory structure your files were stored in originally)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": {}, + "outputs": [], + "source": [ + "outpath = Path(\"../output/processed/\")\n", + "BASE_DIR = Path().cwd()\n", + "config = read_yaml(BASE_DIR.parent / \"topostats\" / \"default_config.yaml\")\n", + "\n", + "scan_data = LoadScans(img_paths=[outpath / \"minicircle.topostats\"], config=config)\n", + "scan_data.get_data()\n", + "image_array = scan_data.img_dict[\"minicircle.topostats\"].image" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": {}, + "source": [ + "## Configuration\n", + "\n", + "A configuration saved as a Python Dictionary is the easiest way to work with plotting and saves a lot of repetitive\n", + "typing of options. A sample is provided below and is stored in the object `plotting_config`.\n", + "\n", + "We set the output directory to be the current working directory, if you wish to set this as something different then you\n", + "should modify the following cell to something like\n", + "\n", + "```\n", + "outpath = Path(\"/path/you/want/to/save/images/to/\")\n", + "```\n", + "\n", + "In the cell below the `outpath` is set to the location from which we load the array data.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [], + "source": [ + "outpath = Path(\"../output/processed/\")\n", + "plotting_config = {\n", + " \"savefig_format\": \"png\", # Options : see https\"://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html\n", + " \"image_set\": \"core\", # Options : all, core\n", + " \"zrange\": [None, None], # low and high height range for core images (can take [null, null])\n", + " \"colorbar\": True, # Options : true, false\n", + " \"axes\": True, # Options : true, false (due to off being a bool when parsed)\n", + " \"cmap\": \"nanoscope\", # Options : nanoscope, afmhot, gwyddion\n", + " \"mask_cmap\": \"grey\", # Options : blu, jet_r and any in matplotlib\n", + " \"histogram_log_axis\": False, # Options : true, false\n", + " \"histogram_bins\": 200, # Number of bins for histogram plots to use\n", + " \"core_set\": True,\n", + " \"title\": \"Height Thresholded\",\n", + " \"image_type\": \"non-binary\",\n", + " \"save\": True,\n", + " \"output_dir\": outpath,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": {}, + "source": [ + "# Plotting with TopoStats\n", + "\n", + "TopoStats includes a class `Image` which makes plotting easy. It requires a few arguments though, the array that is to\n", + "be plotted (`image_array`), the image name (`test_image`) and a dictionary of options which we have defined above.\n", + "\n", + "This last argument, the dictionary of options is prefixed `**` which is known as _Python Keywords_. It means that the\n", + "dictionary is \"unpacked\" and we have setup the dictionary so that every key is an argument to the `Image` class and the\n", + "values of the dictionary are passed into `Image`. If interested in finding out more about this see the following\n", + "articles...\n", + "\n", + "* [Dictionaries in Python – Real Python](https://realpython.com/python-dicts/)\n", + "* [Python args and kwargs: Demystified – Real Python](https://realpython.com/python-kwargs-and-args/)\n", + "\n", + "The cell below \"instantiates\" an object (`image_plot`) of the class `Image`, it _won't_ produce any output....yet!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": {}, + "outputs": [], + "source": [ + "image_plot = Images(image_array, filename=\"test_image\", **plotting_config)" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": {}, + "source": [ + "Classes such as `Image` have \"methods\" associated with them, these are what does the hard work and produces output. This\n", + "means the instance of `Image` that is `image_plot` has a method called `.plot_and_save()` which plots and saves the\n", + "file. The method returns two objects, a `figure` which is the actual plot and an `axes` which is the region or box into\n", + "which the `figure` is drawn. If we call it now we are told the image is saved and we can then display the figure in the\n", + "Notebook by using the returned `figure`.\n", + "\n", + "In this example we have included all of the options from the dictionary relevant to this type of plot, such as\n", + "`colorbar=True` and the `cmap=\"nanoscope\"` (`cmap` is short for \"colormap\" and defines the colours used for plotting)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [], + "source": [ + "figure, axes = image_plot.plot_and_save()\n", + "figure" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": {}, + "source": [ + "### Changing Properties\n", + "\n", + "If we want to change the properties we can either define a new dictionary, or we can modify the properties of the\n", + "instantiated `Images` object `image_plot`. For example to change the colour map (`cmap`) and _not_ plot the `colorbar`\n", + "we can set those values to `viridis` and `False` respectively. And if we want to change the title we can change the\n", + "`title` property." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": {}, + "outputs": [], + "source": [ + "image_plot.cmap = \"viridis\"\n", + "image_plot.colorbar = False\n", + "image_plot.title = \"Minicircle : Height Thresholded...in Viridis!\"\n", + "figure, axes = image_plot.plot_and_save()\n", + "figure" + ] + }, + { + "cell_type": "markdown", + "id": "13", + "metadata": {}, + "source": [ + "### Colormaps\n", + "\n", + "Another colormap (`cmap`) that is available is `afmhot`. We plot the same `minicircle` image using this colormap and\n", + "reinstate the colorbar, giving a unique title." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": {}, + "outputs": [], + "source": [ + "image_plot.cmap = \"afmhot\"\n", + "image_plot.colorbar = True\n", + "image_plot.title = \"Hot Minicircles!\"\n", + "figure, axes = image_plot.plot_and_save()\n", + "figure" + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": {}, + "source": [ + "Internally `Image()` is using the colormap palette defined in the `topostats.theme.Colormap` class that has been\n", + "imported, which defines the range of colours for both `nanoscope`, `gwyddion` and `blu` custom colormaps. We will use\n", + "these later." + ] + }, + { + "cell_type": "markdown", + "id": "16", + "metadata": {}, + "source": [ + "## Plotting a Region\n", + "\n", + "We may be interested in plotting just a region, say the bottom right-hand corner with the cluster of five molecules. To\n", + "do so we need to subset the original array. This requires a little understanding of how to index [Numpy\n", + "arrays](https://numpy.org/doc/stable/user/basics.indexing.html).\n", + "\n", + "A Numpy array holding a TopoStats image is a 2-Dimensional array and each cell can be referenced by its `row` position\n", + "(`y`) first and then its `col` (`x'`). Indexing in Python (and most programming languages) starts at zero (`0`) so to\n", + "get the contents of the very first cell you would use `image_array[0,0]` as shown below which shows you the height\n", + "measurement of that cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": {}, + "outputs": [], + "source": [ + "image_array[0, 0]" + ] + }, + { + "cell_type": "markdown", + "id": "18", + "metadata": {}, + "source": [ + "However, we want to plot a range of rows and columns corresponding to the bottom right hand corner, we can refer to a\n", + "range of values using the notation `start:end` and we can do so both for the `x` dimension and the `y` dimension. To get\n", + "the last 300 rows and the last 300 columns we would therefore use `[701:,701:]` we don't need to specify the end\n", + "location of the columns, Python will just use up to the end of the rows and columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19", + "metadata": {}, + "outputs": [], + "source": [ + "image_array[701:, 701:]" + ] + }, + { + "cell_type": "markdown", + "id": "20", + "metadata": {}, + "source": [ + "We can now plot the subset by instantiating a new object which we call `small_plot` of the class `Images`. Instead of\n", + "passing in the full `image_array` though we take a subset of the last rows after `700` and the last columns `700`. We\n", + "specify a new, unique filename `test_image_small` and reuse the `plotting_config` dictionary." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21", + "metadata": {}, + "outputs": [], + "source": [ + "small_plot = Images(image_array[701:, 701:], filename=\"test_image_small\", **plotting_config)\n", + "figure, axes = small_plot.plot_and_save()\n", + "figure" + ] + }, + { + "cell_type": "markdown", + "id": "22", + "metadata": {}, + "source": [ + "You may notice the colours are brighter in this cropped image than the region as it appears in the full image plot. Read\n", + "on for how to handle this so that they match the whole image." + ] + }, + { + "cell_type": "markdown", + "id": "23", + "metadata": {}, + "source": [ + "## Plot just the image\n", + "\n", + "Its possible that you may want _just_ the image, colorbar or title. This can be done without recourse to the `Image`\n", + "class using Matplotlib directly. We first need to setup a `figure` and `axes` to hold one figure. This is done using\n", + "`plt.subplots()` from Matplotlib.\n", + "\n", + "We use the `Colormap(\"nanoscope\").get_cmap()` class and method to use the `nanoscope` colour map." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": {}, + "outputs": [], + "source": [ + "figure, axes = plt.subplots(1, 1, figsize=(8, 8))\n", + "plt.imshow(image_array, cmap=Colormap(\"nanoscope\").get_cmap())" + ] + }, + { + "cell_type": "markdown", + "id": "25", + "metadata": {}, + "source": [ + "If you want to save the image then use `plt.imsave()` with the same arguments, but give a filename as the first argument." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26", + "metadata": {}, + "outputs": [], + "source": [ + "plt.imsave(outpath / \"image_without_scale_or_title.png\", image_array, cmap=Colormap(\"nanoscope\").get_cmap())" + ] + }, + { + "cell_type": "markdown", + "id": "27", + "metadata": {}, + "source": [ + "## Images and Regions \n", + "\n", + "Here we setup a `figure` and `axes` with `nrows=1` and `ncols=2`, this makes `axes` essentially an array with length of\n", + "2, starting with an index of 0 and so we reference `axes[0]` for the first image, and `axes[1]` for the second and we\n", + "can combine our two images.\n", + "\n", + "We use `plt.savefig()` to save the image to a unique filename under `outpath` location (which we set further back)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28", + "metadata": {}, + "outputs": [], + "source": [ + "figure, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))\n", + "\n", + "axes[0].set_title(\"Full Image\")\n", + "axes[0].imshow(image_array, cmap=Colormap(\"nanoscope\").get_cmap())\n", + "axes[1].set_title(\"Cropped Region\")\n", + "axes[1].imshow(image_array[700:, 700:], cmap=Colormap(\"nanoscope\").get_cmap())\n", + "\n", + "plt.savefig(outpath / \"double_image.png\")" + ] + }, + { + "cell_type": "markdown", + "id": "29", + "metadata": {}, + "source": [ + "You may notice that the colormap is _not_ the same across the two images, in the _Cropped Region_ the heights are now\n", + "much brighter. In order to make these consistent there are two solutions...\n", + "\n", + "a) Obtain the minimum and maximum values from the full image.\n", + "b) Obtain a normalised range from the full image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30", + "metadata": {}, + "outputs": [], + "source": [ + "# Get vmin/vmax values directly\n", + "vmin = image_array.min()\n", + "vmax = image_array.max()\n", + "\n", + "figure, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))\n", + "\n", + "axes[0].set_title(\"Full Image\")\n", + "axes[0].imshow(image_array, cmap=Colormap(\"nanoscope\").get_cmap(), vmin=vmin, vmax=vmax)\n", + "axes[1].set_title(\"Cropped Region\")\n", + "axes[1].imshow(image_array[700:, 700:], cmap=Colormap(\"nanoscope\").get_cmap(), vmin=vmin, vmax=vmax)\n", + "\n", + "\n", + "plt.savefig(outpath / \"double_image_standardised_colour.png\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": {}, + "outputs": [], + "source": [ + "# Normalize the colour range\n", + "norm = mcolors.Normalize(vmin=image_array.min(), vmax=image_array.max())\n", + "\n", + "figure, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))\n", + "\n", + "axes[0].set_title(\"Full Image\")\n", + "axes[0].imshow(image_array, cmap=Colormap(\"nanoscope\").get_cmap(), norm=norm)\n", + "axes[1].set_title(\"Cropped Region\")\n", + "axes[1].imshow(image_array[700:, 700:], cmap=Colormap(\"nanoscope\").get_cmap(), norm=norm)\n", + "\n", + "plt.savefig(outpath / \"double_image_normalised_colour.png\")" + ] + }, + { + "cell_type": "markdown", + "id": "32", + "metadata": {}, + "source": [ + "And of course you can extend this to plot more regions, here we set up a 2x2 grid by virtue of `nrows=2` and\n", + "`ncols=2`. Because this is a 2-d array, as with Numpy arrays we need to index both dimensions, this is done with\n", + "`axes[0,0]` for the first row and column, `axes[0,1]` for the first row and second column, then the second row has\n", + "`axes[1,0]` for the first column and `axes[1,1]` for the second column.\n", + "\n", + "We select different regions for each cell and again normalise the colour scale.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33", + "metadata": {}, + "outputs": [], + "source": [ + "# Normalize the colour range\n", + "norm = mcolors.Normalize(vmin=image_array.min(), vmax=image_array.max())\n", + "\n", + "\n", + "figure, axes = plt.subplots(nrows=2, ncols=2, figsize=(16, 16))\n", + "\n", + "axes[0, 0].set_title(\"Full Image\")\n", + "axes[0, 0].imshow(image_array, cmap=Colormap(\"nanoscope\").get_cmap(), norm=norm)\n", + "axes[0, 1].set_title(\"Cropped Region 1\")\n", + "axes[0, 1].imshow(image_array[700:, 700:], cmap=Colormap(\"nanoscope\").get_cmap(), norm=norm)\n", + "axes[1, 0].set_title(\"Cropped Region 2\")\n", + "axes[1, 0].imshow(image_array[:250, :250], cmap=Colormap(\"nanoscope\").get_cmap(), norm=norm)\n", + "axes[1, 1].set_title(\"Cropped Region 3\")\n", + "axes[1, 1].imshow(image_array[350:550, 600:800], cmap=Colormap(\"nanoscope\").get_cmap(), norm=norm)\n", + "\n", + "# Set axes labels on the outside only\n", + "for ax in axes.flat:\n", + " ax.set(xlabel=\"Nanometres\", ylabel=\"Nanometres\")\n", + "\n", + "plt.savefig(outpath / \"double_image_normalised_colour.png\")" + ] + }, + { + "cell_type": "markdown", + "id": "34", + "metadata": {}, + "source": [ + "# Going Further\n", + "\n", + "This Notebook has been a short introduction to the vast array of options that are available for plotting your image scan\n", + "data. There are a _lot_ of options and it is not practical to translate all of these options into configuration options\n", + "to TopoStats, nor is repeatedly running scripts to generate the exact image you want.\n", + "\n", + "Hopefully the examples introduced above are useful to get you started. More documentation on plotting with Matplotlib\n", + "are available at the following links. \n", + "\n", + "* [Matplotlib — Visualization with Python](https://matplotlib.org/)\n", + "* [Image tutorial — Matplotlib\n", + " documentation](https://matplotlib.org/stable/tutorials/introductory/images.html#sphx-glr-tutorials-introductory-images-py)\n", + "* [Creating multiple subplots using plt.subplots — Matplotlib\n", + " documentation](https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html)\n", + "* [StackOverflow - Matplotlib](https://stackoverflow.com/questions/tagged/matplotlib) A Q&A forum where a lot of\n", + " questions about using Matplotlib have been asked.\n", + " \n", + "If you have questions please feel free to ask in the [Plotting\n", + "Discussions](https://github.com/AFM-SPM/TopoStats/discussions/categories/plotting) section on GitHub.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + }, + "name": "03-Plotting-scans.ipynb" + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tests/resources/minicircle_default_molecule_statistics.csv b/tests/resources/minicircle_default_molecule_statistics.csv new file mode 100644 index 00000000000..5b0f5eaf9e4 --- /dev/null +++ b/tests/resources/minicircle_default_molecule_statistics.csv @@ -0,0 +1,25 @@ +image,grain_number,molecule_number,circular,topology,topology_flip,contour_length,end_to_end_distance,curvature_num_turns,curvature_mean,curvature_max,curvature_min,curvature_std,curvature_var,curvature_total,curvature_median,curvature_iqr,curvature_90th,basename +minicircle.spm,0,0,True,0_1,0_1,8.679884088825495e-08,0.0,0.0,0.116191,0.234495,0.001343,0.056736,0.003219,20.217315,0.123934,0.276639,0.181902,tests\resources +minicircle.spm,1,0,True,0_1,0_1,7.731107566170978e-08,0.0,0.0,0.137903,0.565107,0.003315,0.124851,0.015588,21.374962,0.104374,0.192943,0.315998,tests\resources +minicircle.spm,2,0,True,,,8.359918866554579e-08,0.0,3.0,0.076722,0.45812,0.000449,0.088584,0.007847,12.812629,0.051678,0.175429,0.187265,tests\resources +minicircle.spm,3,0,False,linear,,2.250000000000004e-08,2.247017734178881e-08,0.0,0.004727,0.031987,0.0,0.009141,8.4e-05,0.217422,1e-05,0.003089,0.020599,tests\resources +minicircle.spm,3,1,False,linear,,2.4500000000000047e-08,1.896791626212963e-09,0.0,0.145022,0.530104,0.0,0.170444,0.029051,7.251123,0.047507,0.250232,0.422119,tests\resources +minicircle.spm,4,0,True,,,8.902174767520048e-08,0.0,1.0,0.078749,0.322261,2.8e-05,0.079927,0.006388,14.017367,0.048489,0.207879,0.215462,tests\resources +minicircle.spm,5,0,False,linear,,1.6499999999999986e-08,1.610534450292642e-08,0.0,0.036623,0.123245,0.0,0.040737,0.00166,1.245198,0.021144,0.059978,0.110153,tests\resources +minicircle.spm,5,1,False,linear,,3.050000000000003e-08,1.1584829071755763e-09,0.0,0.126217,0.325398,0.0,0.122562,0.015022,7.82545,0.132028,0.241592,0.300285,tests\resources +minicircle.spm,6,0,True,,,8.999675609590551e-08,0.0,0.0,0.078215,0.296008,0.000904,0.053243,0.002835,14.078761,0.061269,0.210523,0.132308,tests\resources +minicircle.spm,7,0,True,0_1,0_1,9.397400991677563e-08,0.0,1.0,0.091005,0.333456,0.000895,0.078648,0.006185,17.108917,0.062572,0.23822,0.203302,tests\resources +minicircle.spm,8,0,True,0_1,0_1,9.109317108586855e-08,0.0,0.0,0.108652,0.27816,0.000515,0.061867,0.003828,19.774698,0.09763,0.240682,0.187923,tests\resources +minicircle.spm,9,0,False,linear,,1.199999999999999e-08,1.1991070088895365e-08,0.0,0.007129,0.034128,0.0,0.01101,0.000121,0.178213,0.00054,0.012101,0.026859,tests\resources +minicircle.spm,9,1,False,linear,,6.95e-08,1.4135205807794397e-09,1.0,0.063781,0.230423,0.0,0.058602,0.003434,8.929344,0.059926,0.072484,0.141764,tests\resources +minicircle.spm,10,0,True,0_1,0_1,9.393062714946e-08,0.0,2.0,0.093511,0.531589,0.000142,0.105213,0.01107,17.580155,0.058113,0.15733,0.237949,tests\resources +minicircle.spm,11,0,False,,,8.6e-08,9.123232483375223e-09,1.0,0.06184,0.190523,0.0,0.053847,0.0029,10.698315,0.043981,0.193945,0.148025,tests\resources +minicircle.spm,12,0,True,,,9.209180247996876e-08,0.0,1.0,0.08208,0.267368,0.00036,0.065297,0.004264,15.102673,0.07187,0.244525,0.182922,tests\resources +minicircle.spm,13,0,True,,,9.122426448541514e-08,0.0,0.0,0.073363,0.341025,0.000148,0.06732,0.004532,13.352128,0.058354,0.210295,0.140135,tests\resources +minicircle.spm,14,0,True,0_1,0_1,9.358608233459712e-08,0.0,1.0,0.092977,0.397211,0.000869,0.087491,0.007655,17.386733,0.067645,0.173287,0.228745,tests\resources +minicircle.spm,15,0,True,0_1,0_1,8.231198873584995e-08,0.0,2.0,0.093792,0.515567,0.00171,0.113869,0.012966,15.475742,0.057562,0.095557,0.257171,tests\resources +minicircle.spm,16,0,True,,,9.790093974531222e-08,0.0,1.0,0.075652,0.301302,0.00176,0.055373,0.003066,14.827846,0.058509,0.191164,0.13262,tests\resources +minicircle.spm,17,0,False,,,7.699999999999994e-08,5.396094619557678e-09,1.0,0.053338,0.249331,0.0,0.059756,0.003571,8.26738,0.044108,0.137934,0.149956,tests\resources +minicircle.spm,18,0,True,,,9.118654958052414e-08,0.0,0.0,0.076075,0.312095,6.3e-05,0.066815,0.004464,13.845639,0.059934,0.217362,0.162845,tests\resources +minicircle.spm,19,0,True,0_1,0_1,8.845907605136479e-08,0.0,1.0,0.097891,0.415494,0.000821,0.090495,0.008189,17.326727,0.066827,0.199808,0.231132,tests\resources +minicircle.spm,20,0,True,0_1,0_1,8.816470982003954e-08,0.0,1.0,0.097414,0.531169,0.001145,0.101583,0.010319,17.144795,0.060482,0.181059,0.207453,tests\resources diff --git a/topostats/tracing/disordered_tracing.py b/topostats/tracing/disordered_tracing.py index 315a98e44fd..f8f5e9dbd6d 100644 --- a/topostats/tracing/disordered_tracing.py +++ b/topostats/tracing/disordered_tracing.py @@ -91,7 +91,7 @@ def __init__( # pylint: disable=too-many-arguments Dictionary of pruning parameters. Contains 'method', 'max_length', 'height_threshold', 'method_values', 'method_outlier' and 'only_height_prune_endpoints'. n_grain : int - Grain number being processed (only used in logging). + Grain number being processed (only used in logging). """ self.image = image self.mask = mask