This was an study started by Nandana and Mariano in 2016. We started with unsupervised methods, but we could not find good clusters. En 2017 we started with annotated data and here we are.
> DBpedia releases consist of more than 70 multilingual datasets that
> cover data extracted from different language-specific Wikipedia
> instances. The data extracted from those Wikipedia instances are
> transformed into RDF using mappings created by the DBpedia community.
> Nevertheless, not all the mappings are correct and consistent
> across all the distinct language-specific DBpedia datasets.
> As these incorrect mappings are spread in a large number of mappings,
> it is not feasible to inspect all such mappings manually to
> ensure their correctness. Thus, the goal of this work is to propose
> a data-driven method to detect incorrect mappings automatically
> by analyzing the information from both instance data as well
> as ontological axioms. We propose a machine learning based approach
> to building a predictive model which can detect incorrect
> mappings. We have evaluated different supervised classification algorithms
> for this task and our best model achieves 93% accuracy.
> These results help us to detect incorrect mappings and achieve a
> high-quality DBpedia.
The initiative [Google Summer of Code](https://summerofcode.withgoogle.com/) has selected our ideas as a funded project.
Here is [the entry](https://github.com/dbpedia/GSoC/issues/15).
[Predicting Incorrect Mappings: A Data-Driven Approach Applied to DBpedia](https://svn.aksw.org/papers/2018/SAC_DBpedia_mappings_alignment/public.pdf)
M Rico, N Mihindukulasooriya et al. The 33rd ACM/SIGAPP Symposium On Applied Computing (SAC), 2018.