mcyzyk
October 6, 2023, 8:08pm
1
Hello, OpenRefine newbie here...
I have what I think is probably a really simple question regarding GREL:
Given this in my HTML:
meta name="DC.date" content="1739"
How do I extract the DC.date "content" value ("1739")?
value.parseHtml().select("meta") gets me to all the meta tags in my HTML, but how to drill down to DC.date --> content?
Advice much appreciated,
Mark
We have a lot of great examples in our wiki that includes Recipes. Take a look in that section at the many HTML parsing recipes and see if you can work it out, and if not just reply back here and we’ll be glad to help further!
mcyzyk
October 7, 2023, 3:31pm
3
Bingo!
value.parseHtml().select("meta[name='DC.title']")[0].htmlAttr("content").toString()
OpenRefine is about the coolest thing, evah!
1 Like
Thanks! We think so also And to be fair special thanks here goes to Richard Crummy, the author of Beautiful Soup and the basis for the Jsoup library we use in OpenRefine!
mcyzyk
October 8, 2023, 9:46am
6
Jsoup, very powerful!
Now I'm trying to figure out why this:
forEach(value.parseHtml().select("meta[name='DC.subject']").htmlAttr("content"), e, e.ownText()).join("|")
Does not work against this:
meta name="DC.subject" lang="fre" content="Vierges Britanniques, Îles"
meta name="DC.subject" lang="fre" content="Indes occidentales danoises"
meta name="DC.subject" lang="fre" content="Anguilla, Île d'"
meta name="DC.subject" lang="fre" content="Saint-Martin, Île de"
Learning!
mcyzyk
October 8, 2023, 9:56am
7
et viola!
forEach(value.parseHtml().select("meta[name='DC.subject']"), e, e.htmlAttr("content").toString()).join("|")
Spinning in chair,
Mark
Now you are a parsing expert and can help others! Great perseverance!
1 Like