[Ambra-dev] migration not working for all articles.
Pradeep Krishnan
pradeepk at soft-point.com
Tue Dec 23 02:06:38 PST 2008
Hi Russ,
Russell Uman wrote:
>> No. This means there is an 'order' mismatch. The existing
>> order in the database for the list of authors is most likely
>> wrong. (Note that this is for the Bibliographic-Citation. Not
>> the reference-list.)
>
> yes. dragisa's migrator deals with the bibliographic citation, and not
> the reference list.
>
You mean #921 related changes? No, it affects both the bibliographic
citation and the reference list.
>> In this case you would need to delete and re-ingesting this
>> article to fix it. CitationMigrator will not fix it. (Or
>> alternatively, correct the order in mulgara to match the
>> article XML using ITQL)
>
> it may be the case that this is just happening on branch for some
> reason, and won't happen on production.
It is quite likely that this is the case. On moody.topazproject.org
pone.0000285 migrated successfully. See /var/log/topaz/
ambra.log.2008-12-20 and look for pone.0000285.
but if this will also happen in
> production, then i don't think it's acceptable. the migrator really
> needs to migrate everything automatically - if there are some edge cases
> that this kind of migration can't handle, then the migrator should
> produce some helpful output, and we should work on a second automated
> process to deal with these...
>
Look, this is not a migrator issue. All that it is supposed to do is to
add a 'suffix' to the UserProfile. So it is searching for the
appropriate author list to add this. That search is done by index, since
the author list is an rdf:seq. However there is a sanity check in there
to see if the author's names match before it is being updated with the
suffix information.
So the fact that this sanity check is triggered for some articles
indicate that the mulgara data is gone out of sync with the XML. As I've
mentioned in the update to the ticket, this can be easily taken care of
in the Migrator. However since the migration succeeded for pone.0000285
on moody, (without a single author-list out of order failure), it looks
more like the data that you have is corrupted. Try this on a recent
backup of the production data to verify.
Ronald pointed out how you may have arrived at the 50% failure number.
You are basing this on the log message from CitationMigrator. Concluding
a failure rate from that is not valid for the following reasons:
1. Not all failures were due to author name order mismatch
2. A successfully migrated article is not considered for a subsequent
migration. However an article that failed migration in the previous
run will be tried in subsequent runs. So this means the last run
will only have failure cases left. eg. moody reports 0 success and
5 failures. This does not indicate a 100% migration failure. It
just means, other than the 5 that was attempted, everything else
was migrated in previous runs (or never needed a migration - eg.
ingested by 0.9.1 ingester.)
Cheers,
Pradeep
More information about the Ambra-dev
mailing list