Extracting data from an array in Json

Hi,

I have series of Json records like this:

{
"continue": {
"clcontinue":"67535588|History_of_the_Chinese_Communist_Party",
"continue":"||"
},
"query":{
"normalized":[
{ "from":"100th_Anniversary_of_the_Chinese_Communist_Party",
"to":"100th Anniversary of the Chinese Communist Party"}
],
"pages":{
"67535588":{
"pageid":67535588,
"ns":0,
"title":"100th Anniversary of the Chinese Communist Party",
"categories": [
{"ns":14,"title":"Category:2021 in Beijing"},
{"ns":14,"title":"Category:All articles with unsourced statements"},
{"ns":14,"title":"Category:Articles containing Chinese-language text"},
{"ns":14,"title":"Category:Articles containing simplified Chinese-language text"},
{"ns":14,"title":"Category:Articles with short description"},
{"ns":14,"title":"Category:Articles with unsourced statements from March 2022"},
{"ns":14,"title":"Category:CS1 Chinese-language sources (zh)"},
{"ns":14,"title":"Category:Chinese historical anniversaries"},
{"ns":14,"title":"Category:Coordinates on Wikidata"},
{"ns":14,"title":"Category:Events in Beijing"}
]
}
}
}
}

I want to loop in "categories" array, in order to extract all the values of "title" keys. I tried with the following stcript in jython:

import json
data = json.loads(value)
lista =
for pageid, page in data["query"]["pages"].items():
lista.append(page['categories'][0]['title'])

return ":::".join(lista)

It returns only the first value: Category:2021 in Beijing

I know that I should loop in "categories" array. Can you help me?

Thank you very much.

Miquel Centelles

Hi Miquel_Centelles,

you could do it in two steps: first get all the "pages" values (here 67535588) with like

pages = data["query"]["pages"]
print(pages)

and then iterate over the "categories" within each record:

categories = data["query"]["pages"]["67535588"]["categories"]
for category in categories:
    print(category["title"])

Some more generic solution using GREL:

with(
    ":::",
    sep,
    forEach(
        parseJson(value)["query"]["pages"],
        page,
        forEach(
            page["categories"],
            category,
            category["title"]
        ).join(sep)
    ).join(sep)
)
2 Likes