Dataset Map and Reduce methods
This example shows an easy use-case of the Apify dataset
map and
reduce methods. Both methods can be used to simplify the
dataset results workflow process. Both can be called on the dataset directly.
Important to mention is that both methods return a new result (map returns a new array and reduce
can return any type) - neither method updates the dataset in any way.
Examples for both methods are demonstrated on a simple dataset containing the results scraped from a page:
the URL and a hypothetical number of h1 - h3 header elements under the headingCount key.
This data structure is stored in the default dataset under
{PROJECT_FOLDER}/apify_storage/datasets/default/.
If you want to simulate the functionality, you can use the dataset.PushData() method
to save the example JSON array to your dataset.
[
    {
        "url": "https://apify.com/",
        "headingCount": 11
    },
    {
        "url": "https://apify.com/storage",
        "headingCount": 8
    },
    {
        "url": "https://apify.com/proxy",
        "headingCount": 4
    }
]
Map
The dataset map method is very similar to standard Array mapping methods.
It produces a new array of values by mapping each value in the existing array through
a transformation function and an options parameter.
The map method used to check if are there more than 5 header elements on each page:
const Apify = require('apify');
Apify.main(async () => {
    // open default dataset
    const dataSet = await Apify.openDataset();
    // calling map function and filtering through mapped items
    const moreThan5headers = (
        await dataSet.map((item) => item.headingCount)
    ).filter((count) => count > 5);
    // saving result of map to default Key-value store
    await Apify.setValue('pages_with_more_than_5_headers', moreThan5headers);
});
The moreThan5headers variable is an array of headingCount attributes where the number
of headers is greater than 5.
The map method's result value saved to the key-value store should be:
[11, 8];
Reduce
The dataset reduce method does not produce a new array of values - it reduces a list of values down to a single value.
The method iterates through the items in the dataset using the
memo argument.
After performing the necessary calculation, the memo is sent to the next iteration,
while the item just processed is reduced (removed).
Using the reduce method to get the total number of headers scraped (all items in the dataset):
const Apify = require('apify');
Apify.main(async () => {
    // open default dataset
    const dataSet = await Apify.openDataset();
    // calling reduce function and using memo to calculate number of headers
    const pagesHeadingCount = await dataSet.reduce((memo, value) => {
        memo += value.headingCount;
        return memo;
    }, 0);
    // saving result of reduce to default Key-value store
    await Apify.setValue('pages_heading_count', pagesHeadingCount);
});
The original dataset will be reduced to a single value, pagesHeadingCount, which contains
the count of all headers for all scraped pages (all dataset items).
The reduce method's result value saved to the key-value store should be:
23;