Pour les Développeurs

Code Informatique

The MIT-licensed code is available on GitHub. Technologies at play include Apache Spark to group occurrence records by raw entries in recordedBy and identifiedBy and to import into MySQL, Elasticsearch to aid in the searching of people names once parsed and cleaned, Redis to coordinate the processing queues, and Sinatra/ruby for the application layer.

Analyser les Noms

Ruby gem

A stand-alone ruby gem, dwc_agent may be used to parse people names and additionally score given names for structural similarity. It also includes a command-line executable dwcagent that combines parsing and cleaning then produces JSON as output.

$ gem install dwc_agent
$ irb
> parsed = DwcAgent.parse "Lepschi BJ; Albrecht DE"
  => [#<Name family="BJ" given="Lepschi">, #<Name family="DE" given="Albrecht">]
> DwcAgent.clean parsed[0]
  => #<Name family="Lepschi" given="B.J.">
> DwcAgent.similarity_score "J.R.", "Jill R."
  => 2
$ dwcagent "Lepschi BJ; Albrecht DE"
[{"family":"Lepschi","given":"B.J.","suffix":null,"particle":null,"dropping_particle":null,"nick":null,"appellation":null,"title":null},{"family":"Albrecht","given":"D.E.","suffix":null,"particle":null,"dropping_particle":null,"nick":null,"appellation":null,"title":null}]