Meta --> name --> content. GREL expression?

Hello, OpenRefine newbie here...

I have what I think is probably a really simple question regarding GREL:

Given this in my HTML:

meta name="DC.date" content="1739"

How do I extract the DC.date "content" value ("1739")?

value.parseHtml().select("meta") gets me to all the meta tags in my HTML, but how to drill down to DC.date --> content?

Advice much appreciated,

Mark

We have a lot of great examples in our wiki that includes Recipes. Take a look in that section at the many HTML parsing recipes and see if you can work it out, and if not just reply back here and we’ll be glad to help further!

Bingo!

value.parseHtml().select("meta[name='DC.title']")[0].htmlAttr("content").toString()

OpenRefine is about the coolest thing, evah!

1 Like

Thanks! We think so also :slight_smile: :smiley: And to be fair special thanks here goes to Richard Crummy, the author of Beautiful Soup and the basis for the Jsoup library we use in OpenRefine!

Jsoup, very powerful!

Now I'm trying to figure out why this:

forEach(value.parseHtml().select("meta[name='DC.subject']").htmlAttr("content"), e, e.ownText()).join("|")

Does not work against this:

meta name="DC.subject" lang="fre" content="Vierges Britanniques, Îles"
meta name="DC.subject" lang="fre" content="Indes occidentales danoises"
meta name="DC.subject" lang="fre" content="Anguilla, Île d'"
meta name="DC.subject" lang="fre" content="Saint-Martin, Île de"

Learning!

et viola!

forEach(value.parseHtml().select("meta[name='DC.subject']"), e, e.htmlAttr("content").toString()).join("|")

Spinning in chair,

Mark

Now you are a parsing expert and can help others! Great perseverance!

1 Like