Dear Owen
Sorry for being late in response. I was trying the paths as advised by you.
I’m actually exploring the following - 1) Scrap the data from a library catalogue (details given below); 2) Use that dataset to extract a set of selected MARC tags (with content); and 3) prepare a text corpus containing title (245 tag), summary notes (5xx tags) in one column and subject indexing terms (650 tag) in another column.
The source site is here (unfortunately the OAI/PMH based metadata harvesting is not working for this union catalogue) - https://librarycatalogue.nvli.in/.
We can extract from here in the following ways -
type 1
https://librarycatalogue.nvli.in/cgi-bin/koha/opac-MARCdetail.pl?biblionumber=2402943
type 2 (html)
https://librarycatalogue.nvli.in/cgi-bin/koha/opac-showmarc.pl?id=18687&viewas=html
type 3 (xml)
https://librarycatalogue.nvli.in/cgi-bin/koha/opac-showmarc.pl?id=18687&viewas=xml
Biblio numbers are sequential (id=) from 1 to 345,0500. In the earlier mail I produced results by following the type 3 data fetching.
The type 2 data fetching giving me the following results (example):
<html xmlns:marc="http://www.loc.gov/MARC21/slim">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>MARC View</title>
</head>
<body>
<table>
<tr>
<th style="white-space:nowrap">
000
</th>
<td colspan="2"></td>
<td>01320nam a2200325 i 4500</td>
</tr>
<tr>
<th style="white-space:nowrap">001</th>
<td colspan="2"></td>
<td>vtls001911996</td>
</tr>
<tr>
<th style="white-space:nowrap">003</th>
<td colspan="2"></td>
<td>NLI</td>
</tr>
<tr>
<th style="white-space:nowrap">005</th>
<td colspan="2"></td>
<td>20210807000205.0</td>
</tr>
<tr>
<th style="white-space:nowrap">007</th>
<td colspan="2"></td>
<td>cr cn|||||||||</td>
</tr>
<tr>
<th style="white-space:nowrap">008</th>
<td colspan="2"></td>
<td>160914t20142014ctuab s 001 0 eng d</td>
</tr>
<tr>
<th style="white-space:nowrap">020</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>9780300180312 </td>
</tr>
<tr>
<th style="white-space:nowrap">020</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>9780300206197 (e-book) </td>
</tr>
<tr>
<th style="white-space:nowrap">039</th>
<td> </td>
<td>9</td>
<td>
<strong>_a</strong>201610171143<br><strong>_b</strong>gopag<br><strong>_y</strong>201609141656<br><strong>_z</strong>gopag </td>
</tr>
<tr>
<th style="white-space:nowrap">044</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>ctu </td>
</tr>
<tr>
<th style="white-space:nowrap">100</th>
<td>1</td>
<td> </td>
<td>
<strong>_a</strong>Laband, John,<br><strong>_d</strong>1947-<br><strong>_9</strong>1270208 </td>
</tr>
<tr>
<th style="white-space:nowrap">245</th>
<td>1</td>
<td>0</td>
<td>
<strong>_a</strong>Zulu warriors :<br><strong>_b</strong>the battle for the South African frontier /<br><strong>_c</strong>John Laband. </td>
</tr>
<tr>
<th style="white-space:nowrap">260</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>New Haven, Connecticut :<br><strong>_b</strong>Yale University Press,<br><strong>_c</strong>2014. </td>
</tr>
<tr>
<th style="white-space:nowrap">300</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>1 online resource (390 p.) :<br><strong>_b</strong>ill., maps </td>
</tr>
<tr>
<th style="white-space:nowrap">500</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>Includes index. </td>
</tr>
<tr>
<th style="white-space:nowrap">650</th>
<td> </td>
<td>0</td>
<td>
<strong>_a</strong>Zulu War, 1879.<br><strong>_9</strong>1270209 </td>
</tr>
<tr>
<th style="white-space:nowrap">650</th>
<td> </td>
<td>0</td>
<td>
<strong>_a</strong>Zulu (African people)<br><strong>_x</strong>History<br><strong>_y</strong>19th century.<br><strong>_9</strong>1270210 </td>
</tr>
<tr>
<th style="white-space:nowrap">650</th>
<td> </td>
<td>0</td>
<td>
<strong>_a</strong>Sociology, Military<br><strong>_z</strong>South Africa<br><strong>_z</strong>Zululand.<br><strong>_9</strong>1270211 </td>
</tr>
<tr>
<th style="white-space:nowrap">651</th>
<td> </td>
<td>0</td>
<td>
<strong>_a</strong>Zululand (South Africa)<br><strong>_x</strong>History, Military<br><strong>_y</strong>19th century.<br><strong>_9</strong>1270212 </td>
</tr>
<tr>
<th style="white-space:nowrap">856</th>
<td>4</td>
<td>0</td>
<td>
<strong>_u</strong>http://site.ebrary.com/lib/nationallibgovin/Doc?id=10856661<br><strong>_z</strong>An electronic book accessible through the World Wide Web; click to view </td>
</tr>
<tr>
<th style="white-space:nowrap">887</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong> Gopa </td>
</tr>
<tr>
<th style="white-space:nowrap">905</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>Gopa </td>
</tr>
<tr>
<th style="white-space:nowrap">949</th>
<td> </td>
<td> </td>
<td>
<strong>_A</strong>VIRTUAITEM<br><strong>_D</strong>10000<br><strong>_X</strong>206<br><strong>_6</strong>EBK000017588ENG<br><strong>_e</strong>EBK17588 </td>
</tr>
<tr>
<th style="white-space:nowrap">942</th>
<td> </td>
<td> </td>
<td>
<strong>_2</strong>ddc<br><strong>_c</strong>BKS </td>
</tr>
<tr>
<th style="white-space:nowrap">999</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>VIRTUA<br><strong>_c</strong>2402943<br><strong>_d</strong>2402943 </td>
</tr>
<tr>
<th style="white-space:nowrap">999</th>
<td> </td>
<td> </td>
<td>
<strong>_a</strong>VTLSSORT0070*0080*0200*0201*0440*1000*2450*2600*3000*5000*6500*6501*6502*6510*8560*9050*9992 </td>
</tr>
</table>
</body>
</html>
A GREL like this gives me a way to finally produce data for exporting into MarcEdit from OpenRefine (RecordNumber | Tag | Indicators | Content) but the problem is that in many cases the result is not consistent for obvious reasons (like in absence of tag 100, tag 245 content is coming in the column of tag 100).
forEach(value.parseHtml().select("tr"),e,e.htmlText()).join("@@")
Result
000 01320nam a2200325 i 4500@@001 vtls001911996@@003 NLI@@005 20210807000205.0@@007 cr cn|||||||||@@008 160914t20142014ctuab s 001 0 eng d@@020 _a9780300180312@@020 _a9780300206197 (e-book)@@039 9 _a201610171143 _bgopag _y201609141656 _zgopag@@044 _actu@@100 1 _aLaband, John, _d1947- _91270208@@245 1 0 _aZulu warriors : _bthe battle for the South African frontier / _cJohn Laband.@@260 _aNew Haven, Connecticut : _bYale University Press, _c2014.@@300 _a1 online resource (390 p.) : _bill., maps@@500 _aIncludes index.@@650 0 _aZulu War, 1879. _91270209@@650 0 _aZulu (African people) _xHistory _y19th century. _91270210@@650 0 _aSociology, Military _zSouth Africa _zZululand. _91270211@@651 0 _aZululand (South Africa) _xHistory, Military _y19th century. _91270212@@856 4 0 _uhttp://site.ebrary.com/lib/nationallibgovin/Doc?id=10856661 _zAn electronic book accessible through the World Wide Web; click to view@@887 _a Gopa@@905 _aGopa@@949 _AVIRTUAITEM _D10000 _X206 _6EBK000017588ENG _eEBK17588@@942 _2ddc _cBKS@@999 _aVIRTUA _c2402943 _d2402943@@999 _aVTLSSORT0070*0080*0200*0201*0440*1000*2450*2600*3000*5000*6500*6501*6502*6510*8560*9050*9992
I’m still searching way out.
Best regards
Parthasarathi Mukhopadhyay