LCSk: A refined similarity measure

G. Benson, A. Levy, S. Maimoni, D. Noifeld, B. R. Shalom

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

In this paper we define a new similarity measure: LCSk, aiming at finding the maximal number of k length substrings matching in both input strings while preserving their order of appearance, for which the traditional LCS is a special case, where k=1. We examine this generalization in both theory and practice. We first describe its basic solution and give an experimental evidence in real data for its ability to differentiate between sequences that are considered similar according to the LCS measure. We then examine extensions of the LCSk definition to LCS in at least k-length substrings (LCSk) and 2-dimensional LCSk and also define complementary EDk and EDk distances.

Original languageEnglish
Pages (from-to)11-26
Number of pages16
JournalTheoretical Computer Science
Volume638
DOIs
StatePublished - 25 Jul 2016
Externally publishedYes

Keywords

  • Dynamic programming
  • Edit distance
  • Longest common subsequence
  • Similarity of strings

Fingerprint

Dive into the research topics of 'LCSk: A refined similarity measure'. Together they form a unique fingerprint.

Cite this