Relative Content

Tag Archive for data

Deduplication of complex records / Similarity Detection

I’m working on a project that involves records with fairly large numbers of fields (~15-20) and I’m trying to figure out a good way to implement deduplication. Essentially the records are people along with some additional data. For example, the records are likely to include personal information like first name, last name, postal address, email address, etc. but not all records have the same amount of data.

Deduplication of complex records / Similarity Detection

I’m working on a project that involves records with fairly large numbers of fields (~15-20) and I’m trying to figure out a good way to implement deduplication. Essentially the records are people along with some additional data. For example, the records are likely to include personal information like first name, last name, postal address, email address, etc. but not all records have the same amount of data.

Architecture for a system where users want to use Excel for collecting and merging data

My organization handles batches of stuff that need to be checked by Quality Control. What I need is something for the people who work in Quality Control to be able to register and input data inside the laboratory, they will be walking so we thought that tablets with an excel would work pretty well but we ran into the problem of file merging and losing data.

Why is it called Data-Oriented Design?

I find the name data-oriented design very confusing – it sounds like data-driven development (letting hard data determine decisions in development) or data-driven programming (control flow determined by loaded data and not hard coded), except it’s something completely different: designing for current cache-dominated CPU architectures combined with an emphasis on making a few highly-reusable functions.