I need to reconcile csv with authors. A common task. Column looks like that:
Александр Дюма
Александр Дюма
Александр Дюма.
Дюма Александр
Дюма, Александр
Сельма Лагерлёф
Лагерлеф Сельма
Лагерлёф Сельма
Лагерлёф, Сельма
Сельма Лагерлеф
What I did first: applied clustering and reconciled produced unique names.
I've found out reconciliation result was not so good. I think it would be better if it would take into considerations not a single final unique name produced by clustering, but all known invariants of this name. Is it possible? How can it be done?
I can try joining all invariants into list into single column, but it doesn't seem right.
Original data structure is not important, I can transform it freely. Thanks in advance.
Also, if some effective technique exists for this common task which will make my original question irrelevant, feel free to share!