GREL for inverted abstract

The OpenAlex (https://docs.openalex.org/) API call - “https://api.openalex.org/works?search=” + value.escape(‘url’) +“&filter=has_abstract:true&per-page=5&page=1” provides response as expected but abstract of a document is given in inverted form (possibly to avoid copyright issues).

A GREL like this : value.parseJson().results[0].abstract_inverted_index generates response like this:

{“This”:[0],“paper”:[1],“critically”:[2],“explores”:[3],“the”:[4],“implicit”:[5],“and”:[6],“explicit”:[7],“message”:[8],“that”:[9],“‘everything’s”:[10],“a”:[11],“learning”:[12],“experience’”:[13],“when”:[14],“social”:[15],“work”:[16],“students”:[17],“engage”:[18],“in”:[19],“practice/community”:[20],“contexts”:[21],“as”:[22],“part”:[23],“of”:[24],“their”:[25],“professio…”:[26]}

Is there any way to generate abstract in plain text (not inverted like this) by applying a GREL?

Regards

Hm, is there a “clean” way to do this with array operations? :thinking:

The dirty way would be to just remove the number references via a regular expression (assuming that the text is in order):

value.replace(/":\[\d+\](,")?/, " ").substring(2, -2)

The substring operation will remove the leading and training brackets and quotation marks.

Thanks for such a prompt response.

After application the regex based replace, it appears like below:

forEach(value.parseJson().results,v,v.abstract_inverted_index.replace(/“:[\d+](,”)?/, " ").substring(2, -2))

[ “This paper critically explores the implicit and explicit message that ‘everything’s a learning experience’ when social work students engage in practice/community contexts as part of their professio…”, “Previous research":[1,183],"has established that":[4,27,189],"gender":[5,162],"and":[6,15,22,30,68,96,107,119,141,153,167,197],"sexual":[7,164],"minority (2SLGBTQ+) youth":[10,53,209],"experience worse mental":[13,28,64,81,139,186],"health":[14,29,65,82],"substance":[16,31,69,84,142],"use":[17,32,85,143],"outcomes than their heterosexual cisgender counterparts. Research suggests concerns have":[34,190],"been":[35,191],"exacerbated by the":[38,88,91,185,194,207],"COVID-19":[39,92,195],"pandemic. The current study":[43,180],"used":[44,200],"self-reported online survey responses from 1404 Canadian 2SLGBTQ+":[52,208],"which included, but were":[57,112],"not limited to, questions":[61,72],"regarding previous experiences, diagnoses, use. Additional assessed whether":[74,97],"participants had":[76,99],"expressed a":[78,121,150],"need":[79,122,151],"for":[80,123,152,206],"and/or resources":[86,124],"since beginning of":[90,169],"pandemic":[93,196],"(March 2020) they experienced":[100,192],"barriers":[101,129,155],"when accessing":[103,131],"this care.":[105,157],"Bivariate":[106,134],"multinomial logistic regression":[110,159],"analyses":[111,135],"conducted to":[114,130,156,172,201],"determine associations between variables":[118,144],"expressing":[120,149],"as":[125,127],"well experiencing":[128,154],"these resources. revealed":[136,161],"multiple sociodemographic, health, significantly":[145,174],"associated with":[147,176],"both":[148,177],"Multinomial analysis identity, orientation, ethnicity, level educational attainment be":[173,199],"correlated cases. This supports growing on health-related harms during could inform tailored intervention plans population.”, “Canadian, US, and":[2,6,24,46,76,82,88,102,106,120,132,157,160,162,164],"UK public health":[5,13,34],"clinical research":[8,43],"has identified barriers to":[12,84,96,116,150],"service":[14,35],"access":[15,54,105,149],"for":[16,136],"Two-Spirit, lesbian, gay, bisexual, transgender, queer, non-binary, intersex (2SLGBTQ+) communities. While offering important insight into the":[33,61,86,91,143],"experiences of":[37,42,90,104,125,148],"2SLGBTQ+":[38,62,137],"communities, this":[40,71,114,141],"body only":[44,48],"recently, still minimally, reports on":[51,57],"home care":[53,110,135],"experiences. Drawing key findings from Home":[63,109],"Care Access":[65,75],"Project, a":[67,94,123],"mixed-methods, Ontario-wide study, paper animates an Equity Framework, using participant stories perspectives underscore relevance effectiveness Framework as tool":[95,115],"support":[97,140],"systematic organizational assessment, evaluation, implementation equity strategies. organizations can use assess their programs":[119,163],"services along continuum intentionally":[126,133],"inviting,":[127,129],"unintentionally":[128,130],"disinviting, disinviting people. To process, framework includes six indicators care: community engagement, leadership, environment, policies processes, education training, services.”, “The COVID-19":[1,34,68,168],"pandemic has disproportionately impacted 2SLGBTQ+":[6,160,229],"youth":[7,44,94,99,137,142,161,172,230],"experiencing":[8,45,162,231],"homelessness. Little is known about":[13,191],"vaccine":[14,35,42,69,87,202,223],"attitudes and":[16,37,39,67,73,95,122,140,143,152,193,196,213,225,233],"uptake among this":[19,27,32],"population. To address":[22,235],"this, the":[24,48,157,167,180,219,222,236,245],"objectives of":[26,93,131,159,184],"study":[28,217],"were":[29,52,76,102,125,164,173,204],"to":[30,54,78,82,104,117,177,200,227,234],"explore":[31,83],"group’s attitudes, facilitators barriers impacting uptake.2SLGBTQ+ homelessness":[46,163,232],"in":[47,56,106,148,166,179],"Greater Toronto Area recruited participate":[55,105],"online":[57,107],"surveys":[58,139],"assessing demographic characteristics, mental health, health":[64,188,211,238],"service use, attitudes. Descriptive statistics statistical tests used":[77,116],"analyze":[79,118],"survey data":[81,124,154],"variables associated with confidence. Additionally, a":[90,132],"select group frontline workers from serving organizations invited one-on-one":[108,149],"interviews.":[109,150],"An iterative thematic content approach was interview data. Quantitative":[121,151],"qualitative":[123,153],"merged for":[127,221],"interpretation by":[129,244],"use convergent parallel analytical design.Ninety-two completed 32 15 key informants participated showed that":[156,240],"majority confident":[165,175],"vaccine; however, numerous non-vaccine due mistrust healthcare system, lack targeted":[185,209],"vaccine-related public":[187,210],"information, concerns safety side effects, accessibility":[197,215],"issues. Solutions increase confidence provided, including fostering trust, messaging, addressing needs.Our highlights need strategy rollouts prioritize pervasive disparities have been exacerbated pandemic.”, “2SLGBTQ+":[0,4,17,27,50,66,74,86,92,194,220,236],"leisure":[1,75,113,159,187,221],"spaces":[2,22,188,222],"(e.g.,":[3,56,152],"community centres and":[7,15,39,72,107,112,128,137,148,175,183,206,212,233,240],"recreation groups) offer opportunities to":[12,29,35,134,227],"form identities":[14,55,144],"augment people’s":[18,237],"overall well-being. These are considered ‘safe’ for":[26,49],"people":[28,51,93,195],"escape heterosexism, while being able openly express themselves develop community. However, these":[43,63,95],"might be":[45,169],"sites of":[47,62,69,85,125,142,179,181,192,199,231],"discrimination":[48,210,232],"with":[52,123],"other":[53,149],"minoritized racialized":[57,91,193,235],"people), given the":[60,83,119,166,190,213,229],"whiteness spaces.":[64,160],"Racialized individuals’ experiences":[68,136],"discrimination, generally within":[73,157,185],"spaces,":[76,87],"can":[77,168,176],"threaten their well-being, thus highlighting value but how":[89,165,209],"do negotiate often-problematic spaces? This":[98,161],"paper":[99,162],"presents a":[101,131],"conceptual":[102,116],"framework":[103,117,167],"that":[104,217],"bridges theories":[106,124],"research across social":[110,143,204,238],"work studies. The extends minority stress theory intersectionality, whiteness, resilience":[129,184],"using socioecological lens interrogate outcomes along multiple dimensions created by racism oppressive systems":[151,180],"sexism, cisgenderism, classism, ableism) queer":[158,186],"also describes implemented as an":[172,225],"analytic tool facilitate investigations oppression from perspective through critical examination power relations, relationality, complexity, justice, whiteness. Understanding occurs multi-level resilience-promoting factors exist in will provide avenue address effects foster well-being inclusion.” ]

I don’t have any clue what to do - :frowning:

Ok so the first example was the trivial one where each fragment was exactly one word and each word mapped exactly to one position. Others are more complicated with whole fragments mapping to several position in the text…

Honestly I would not try to solve this with GREL but apply a Jython snippet after extracting the JSON.

So expecting a cell to look like:

{ "two": [2, 3], "one": [1], "third": [4]}

Then use Jython to:

  1. Create a dictionary from the JSON
  2. Create a new list with (position, text_fragment) mappings
  3. Sort the list by positions
  4. Join the fragments in the sorted list by a whitespace
import json
from operator import itemgetter 
inverted_abstract = json.loads(value)
abstract = [(position, fragment) for fragment, positions in inverted_abstract.items() for position in positions]
abstract.sort(key=itemgetter(0))
return " ".join(fragment[1] for fragment in abstract)

Thanks.

The resultant cell looks like this now, after applying the GREL - forEach(value.parseJson().results,v,v.abstract_inverted_index).join(“##”).

Each abstract in inverted_index format is separated by ‘##’.

Then I applied your jython script. But it shows null in Preview tab. What am I missing?

Regards

{“This”:[0],“paper”:[1],“critically”:[2],“explores”:[3],“the”:[4],“implicit”:[5],“and”:[6],“explicit”:[7],“message”:[8],“that”:[9],“‘everything’s”:[10],“a”:[11],“learning”:[12],“experience’”:[13],“when”:[14],“social”:[15],“work”:[16],“students”:[17],“engage”:[18],“in”:[19],“practice/community”:[20],“contexts”:[21],“as”:[22],“part”:[23],“of”:[24],“their”:[25],“professio…”:[26]}##{“Previous”:[0],“research”:[1,183],“has”:[2],“established”:[3],“that”:[4,27,189],“gender”:[5,162],“and”:[6,15,22,30,68,96,107,119,141,153,167,197],“sexual”:[7,164],“minority”:[8],“(2SLGBTQ+)”:[9],“youth”:[10,53,209],“experience”:[11],“worse”:[12],“mental”:[13,28,64,81,139,186],“health”:[14,29,65,82],“substance”:[16,31,69,84,142],“use”:[17,32,85,143],“outcomes”:[18],“than”:[19],“their”:[20],“heterosexual”:[21],“cisgender”:[23],“counterparts.”:[24],“Research”:[25],“suggests”:[26],“concerns”:[33],“have”:[34,190],“been”:[35,191],“exacerbated”:[36],“by”:[37],“the”:[38,88,91,185,194,207],“COVID-19”:[39,92,195],“pandemic.”:[40],“The”:[41],“current”:[42],“study”:[43,180],“used”:[44,200],“self-reported”:[45],“online”:[46],“survey”:[47],“responses”:[48],“from”:[49],“1404”:[50],“Canadian”:[51],“2SLGBTQ+”:[52,208],“which”:[54],“included,”:[55],“but”:[56],“were”:[57,112],“not”:[58],“limited”:[59],“to,”:[60],“questions”:[61,72],“regarding”:[62],“previous”:[63],“experiences,”:[66],“diagnoses,”:[67],“use.”:[70],“Additional”:[71],“assessed”:[73],“whether”:[74,97],“participants”:[75],“had”:[76,99],“expressed”:[77],“a”:[78,121,150],“need”:[79,122,151],“for”:[80,123,152,206],“and/or”:[83],“resources”:[86,124],“since”:[87],“beginning”:[89],“of”:[90,169],“pandemic”:[93,196],“(March”:[94],“2020)”:[95],“they”:[98],“experienced”:[100,192],“barriers”:[101,129,155],“when”:[102],“accessing”:[103,131],“this”:[104],“care.”:[105,157],“Bivariate”:[106,134],“multinomial”:[108],“logistic”:[109],“regression”:[110,159],“analyses”:[111,135],“conducted”:[113],“to”:[114,130,156,172,201],“determine”:[115],“associations”:[116],“between”:[117],“variables”:[118,144],“expressing”:[120,149],“as”:[125,127],“well”:[126],“experiencing”:[128,154],“these”:[132],“resources.”:[133],“revealed”:[136,161],“multiple”:[137],“sociodemographic,”:[138],“health,”:[140],“significantly”:[145,174],“associated”:[146],“with”:[147,176],“both”:[148,177],“Multinomial”:[158],“analysis”:[160],“identity,”:[163],“orientation,”:[165],“ethnicity,”:[166],“level”:[168],“educational”:[170],“attainment”:[171],“be”:[173,199],“correlated”:[175],“cases.”:[178],“This”:[179],“supports”:[181],“growing”:[182],“on”:[184],“health-related”:[187],“harms”:[188],“during”:[193],“could”:[198],“inform”:[202],“tailored”:[203],“intervention”:[204],“plans”:[205],“population.”:[210]}##{“Canadian,”:[0],“US,”:[1],“and”:[2,6,24,46,76,82,88,102,106,120,132,157,160,162,164],“UK”:[3],“public”:[4],“health”:[5,13,34],“clinical”:[7],“research”:[8,43],“has”:[9],“identified”:[10],“barriers”:[11],“to”:[12,84,96,116,150],“service”:[14,35],“access”:[15,54,105,149],“for”:[16,136],“Two-Spirit,”:[17],“lesbian,”:[18],“gay,”:[19],“bisexual,”:[20],“transgender,”:[21],“queer,”:[22],“non-binary,”:[23],“intersex”:[25],“(2SLGBTQ+)”:[26],“communities.”:[27],“While”:[28],“offering”:[29],“important”:[30],“insight”:[31],“into”:[32],“the”:[33,61,86,91,143],“experiences”:[36],“of”:[37,42,90,104,125,148],“2SLGBTQ+”:[38,62,137],“communities,”:[39],“this”:[40,71,114,141],“body”:[41],“only”:[44,48],“recently,”:[45],“still”:[47],“minimally,”:[49],“reports”:[50],“on”:[51,57],“home”:[52],“care”:[53,110,135],“experiences.”:[55],“Drawing”:[56],“key”:[58],“findings”:[59],“from”:[60],“Home”:[63,109],“Care”:[64],“Access”:[65,75],“Project,”:[66],“a”:[67,94,123],“mixed-methods,”:[68],“Ontario-wide”:[69],“study,”:[70],“paper”:[72],“animates”:[73],“an”:[74],“Equity”:[77],“Framework,”:[78],“using”:[79],“participant”:[80],“stories”:[81],“perspectives”:[83],“underscore”:[85],“relevance”:[87],“effectiveness”:[89],“Framework”:[92],“as”:[93],“tool”:[95,115],“support”:[97,140],“systematic”:[98],“organizational”:[99],“assessment,”:[100],“evaluation,”:[101],“implementation”:[103],“equity”:[107],“strategies.”:[108],“organizations”:[111],“can”:[112],“use”:[113],“assess”:[117],“their”:[118],“programs”:[119,163],“services”:[121],“along”:[122],“continuum”:[124],“intentionally”:[126,133],“inviting,”:[127,129],“unintentionally”:[128,130],“disinviting,”:[131],“disinviting”:[134],“people.”:[138],“To”:[139],“process,”:[142],“framework”:[144],“includes”:[145],“six”:[146],“indicators”:[147],“care:”:[151],“community”:[152],“engagement,”:[153],“leadership,”:[154],“environment,”:[155],“policies”:[156],“processes,”:[158],“education”:[159],“training,”:[161],“services.”:[165]}##{“The”:[0],“COVID-19”:[1,34,68,168],“pandemic”:[2],“has”:[3],“disproportionately”:[4],“impacted”:[5],“2SLGBTQ+”:[6,160,229],“youth”:[7,44,94,99,137,142,161,172,230],“experiencing”:[8,45,162,231],“homelessness.”:[9],“Little”:[10],“is”:[11],“known”:[12],“about”:[13,191],“vaccine”:[14,35,42,69,87,202,223],“attitudes”:[15],“and”:[16,37,39,67,73,95,122,140,143,152,193,196,213,225,233],“uptake”:[17],“among”:[18],“this”:[19,27,32],“population.”:[20],“To”:[21],“address”:[22,235],“this,”:[23],“the”:[24,48,157,167,180,219,222,236,245],“objectives”:[25],“of”:[26,93,131,159,184],“study”:[28,217],“were”:[29,52,76,102,125,164,173,204],“to”:[30,54,78,82,104,117,177,200,227,234],“explore”:[31,83],“group’s”:[33],“attitudes,”:[36],“facilitators”:[38],“barriers”:[40],“impacting”:[41],“uptake.2SLGBTQ+”:[43],“homelessness”:[46,163,232],“in”:[47,56,106,148,166,179],“Greater”:[49],“Toronto”:[50],“Area”:[51],“recruited”:[53],“participate”:[55,105],“online”:[57,107],“surveys”:[58,139],“assessing”:[59],“demographic”:[60],“characteristics,”:[61],“mental”:[62],“health,”:[63],“health”:[64,188,211,238],“service”:[65],“use,”:[66],“attitudes.”:[70],“Descriptive”:[71],“statistics”:[72],“statistical”:[74],“tests”:[75],“used”:[77,116],“analyze”:[79,118],“survey”:[80],“data”:[81,124,154],“variables”:[84],“associated”:[85],“with”:[86],“confidence.”:[88],“Additionally,”:[89],“a”:[90,132],“select”:[91],“group”:[92],“frontline”:[96],“workers”:[97],“from”:[98],“serving”:[100],“organizations”:[101],“invited”:[103],“one-on-one”:[108,149],“interviews.”:[109,150],“An”:[110],“iterative”:[111],“thematic”:[112],“content”:[113],“approach”:[114],“was”:[115],“interview”:[119],“data.”:[120],“Quantitative”:[121,151],“qualitative”:[123,153],“merged”:[126],“for”:[127,221],“interpretation”:[128],“by”:[129,244],“use”:[130],“convergent”:[133],“parallel”:[134],“analytical”:[135],“design.Ninety-two”:[136],“completed”:[138],“32”:[141],“15”:[144],“key”:[145],“informants”:[146],“participated”:[147],“showed”:[155],“that”:[156,240],“majority”:[158],“confident”:[165,175],“vaccine;”:[169],“however,”:[170],“numerous”:[171],“non-vaccine”:[174],“due”:[176],“mistrust”:[178],“healthcare”:[181],“system,”:[182],“lack”:[183],“targeted”:[185,209],“vaccine-related”:[186],“public”:[187,210],“information,”:[189],“concerns”:[190],“safety”:[192],“side”:[194],“effects,”:[195],“accessibility”:[197,215],“issues.”:[198],“Solutions”:[199],“increase”:[201],“confidence”:[203],“provided,”:[205],“including”:[206],“fostering”:[207],“trust,”:[208],“messaging,”:[212],“addressing”:[214],“needs.Our”:[216],“highlights”:[218],“need”:[220],“strategy”:[224],“rollouts”:[226],“prioritize”:[228],“pervasive”:[237],“disparities”:[239],“have”:[241],“been”:[242],“exacerbated”:[243],“pandemic.”:[246]}##{“2SLGBTQ+”:[0,4,17,27,50,66,74,86,92,194,220,236],“leisure”:[1,75,113,159,187,221],“spaces”:[2,22,188,222],“(e.g.,”:[3,56,152],“community”:[5],“centres”:[6],“and”:[7,15,39,72,107,112,128,137,148,175,183,206,212,233,240],“recreation”:[8],“groups)”:[9],“offer”:[10],“opportunities”:[11],“to”:[12,29,35,134,227],“form”:[13],“identities”:[14,55,144],“augment”:[16],“people’s”:[18,237],“overall”:[19],“well-being.”:[20],“These”:[21],“are”:[23],“considered”:[24],“‘safe’”:[25],“for”:[26,49],“people”:[28,51,93,195],“escape”:[30],“heterosexism,”:[31],“while”:[32],“being”:[33],“able”:[34],“openly”:[36],“express”:[37],“themselves”:[38],“develop”:[40],“community.”:[41],“However,”:[42],“these”:[43,63,95],“might”:[44],“be”:[45,169],“sites”:[46],“of”:[47,62,69,85,125,142,179,181,192,199,231],“discrimination”:[48,210,232],“with”:[52,123],“other”:[53,149],“minoritized”:[54],“racialized”:[57,91,193,235],“people),”:[58],“given”:[59],“the”:[60,83,119,166,190,213,229],“whiteness”:[61],“spaces.”:[64,160],“Racialized”:[65],“individuals’”:[67],“experiences”:[68,136],“discrimination,”:[70],“generally”:[71],“within”:[73,157,185],“spaces,”:[76,87],“can”:[77,168,176],“threaten”:[78],“their”:[79],“well-being,”:[80],“thus”:[81],“highlighting”:[82],“value”:[84],“but”:[88],“how”:[89,165,209],“do”:[90],“negotiate”:[94],“often-problematic”:[96],“spaces?”:[97],“This”:[98,161],“paper”:[99,162],“presents”:[100],“a”:[101,131],“conceptual”:[102,116],“framework”:[103,117,167],“that”:[104,217],“bridges”:[105],“theories”:[106,124],“research”:[108],“across”:[109],“social”:[110,143,204,238],“work”:[111],“studies.”:[114],“The”:[115],“extends”:[118],“minority”:[120],“stress”:[121],“theory”:[122],“intersectionality,”:[126],“whiteness,”:[127],“resilience”:[129,184],“using”:[130],“socioecological”:[132],“lens”:[133],“interrogate”:[135],“outcomes”:[138],“along”:[139],“multiple”:[140],“dimensions”:[141],“created”:[145],“by”:[146],“racism”:[147],“oppressive”:[150],“systems”:[151,180],“sexism,”:[153],“cisgenderism,”:[154],“classism,”:[155],“ableism)”:[156],“queer”:[158,186],“also”:[163],“describes”:[164],“implemented”:[170],“as”:[171],“an”:[172,225],“analytic”:[173],“tool”:[174],“facilitate”:[177],“investigations”:[178],“oppression”:[182],“from”:[189],“perspective”:[191],“through”:[196],“critical”:[197],“examination”:[198],“power”:[200],“relations,”:[201],“relationality,”:[202],“complexity,”:[203],“justice,”:[205],“whiteness.”:[207],“Understanding”:[208],“occurs”:[211],“multi-level”:[214],“resilience-promoting”:[215],“factors”:[216],“exist”:[218],“in”:[219],“will”:[223],“provide”:[224],“avenue”:[226],“address”:[228],“effects”:[230],“foster”:[234],“well-being”:[239],“inclusion.”:[241]}

I would try to have each abstract in each own cells.

You can do this by splitting the multi-valued cells on the separator ##.

In the transformation dialog you then also have to change languge from GREL to Python / Jython.

1 Like

It is now working like a magic. Many many thanks.
I’m now having a pure text corpus as given below in place of inverted_index format.

Regards

The COVID-19 pandemic has disproportionately impacted 2SLGBTQ+ youth experiencing homelessness. Little is known about vaccine attitudes and uptake among this population. To address this, the objectives of this study were to explore this group's COVID-19 vaccine attitudes, and facilitators and barriers impacting vaccine uptake.2SLGBTQ+ youth experiencing homelessness in the Greater Toronto Area were recruited to participate in online surveys assessing demographic characteristics, mental health, health service use, and COVID-19 vaccine attitudes. Descriptive statistics and statistical tests were used to analyze survey data to explore variables associated with vaccine confidence. Additionally, a select group of youth and frontline workers from youth serving organizations were invited to participate in online one-on-one interviews. An iterative thematic content approach was used to analyze interview data. Quantitative and qualitative data were merged for interpretation by use of a convergent parallel analytical design.Ninety-two youth completed surveys and 32 youth and 15 key informants participated in one-on-one interviews. Quantitative and qualitative data showed that the majority of 2SLGBTQ+ youth experiencing homelessness were confident in the COVID-19 vaccine; however, numerous youth were non-vaccine confident due to mistrust in the healthcare system, lack of targeted vaccine-related public health information, concerns about safety and side effects, and accessibility issues. Solutions to increase vaccine confidence were provided, including fostering trust, targeted public health messaging, and addressing accessibility needs.Our study highlights the need for the vaccine strategy and rollouts to prioritize 2SLGBTQ+ youth experiencing homelessness and to address the pervasive health disparities that have been exacerbated by the pandemic.

1 Like