Class: Legion::Extensions::MicrosoftTeams::LocalCache::RecordParser

Inherits:
Object
  • Object
show all
Defined in:
lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb

Overview

Parses Chromium IndexedDB values from Teams LevelDB records. Values use 0x22 (double-quote) as a string marker followed by varint length. Teams stores conversation objects as sequential key-value string pairs.

Gotchas:

- Boolean fields (isSanitized, isModerator, etc.) have non-string values
  that get skipped by the string extractor, causing the next field name
  to appear immediately.
- HTML content strings get split on internal 0x22 bytes (from HTML attributes
  like href="..."), producing multiple string fragments for one content field.
- Field names are well-known and can be used to detect key vs value.

Constant Summary collapse

KNOWN_FIELDS =

Known field names in Teams conversation records. Used to distinguish field names from field values in the string stream.

Set.new(%w[
  id source version type content contentHash isSanitized messagetype messageType
  contenttype contentType activitytype activityType clientmessageid clientMessageId
  sequenceId prioritizeimdisplayname prioritizeImDisplayName imdisplayname
  fromDisplayNameInToken fromFamilyNameInToken fromGivenNameInToken
  fromAgentIdentityBlueprintId properties mentions cards importance subject title
  links files formatVariant languageStamp draftDetails innerThreadId state
  inlineImages callId composetime composeTime originalarrivaltime originalArrivalTime
  from fromUserId conversationLink skypeguid translation deletionInfo
  annotationsSummary threadtype threadType postType dlpData crossPostData
  callLogsOwnerId sendPipelineStatus streamingMetadata originalParentMessageId
  skypeeditedid importMetadata recipientId isPlainTextConvertedToHtml
  clientArrivalTime lastMessage members botMembers rosterVersion rosterSummary
  nonFilteredLastMessageTimeUtc __typename localClientId memberProperties
  memberExpirationTime role explicitlyAdded isModerator isFollowing isReader
  channelOnlyMember messages lastMessageTimeUtc detailsVersion
  consumptionHorizonForPinnedMessages consumptionhorizon consumptionHorizonBookmark
  rclch rclchBookmark lastTimeFavorited favorite ispinned
  lastimportantimreceivedtime lasturgentimreceivedtime isfollowed followAllRc
  notifyAllRc collapsed isGeneralChannelFavorite pinnedVersion pinnedOrder
  hasMessageDraft targetLink teamId threadProperties topic topicThreadTopic
  spaceThreadTopic spaceThreadVersion description favDefault
  channelDocsFolderRelativeUrl channelDocsDocumentLibraryId sharepointRootLibrary
  isdeleted tenantid creator retentionHorizon retentionHorizonV2
  sharedInSpaces spaceId gapDetectionEnabled createdat groupId
  extensionDefinitionContainer lastjoinat lastleaveat chatModalityType
  threadingMode csav1 teamSmtpAddress spaceType spaceTypes classification
  dynamicMembership isMaxMemberLimitExceeded isTeamLocked
  isUnlockMembershipSyncRequired picture pictureETag sharepointSiteUrl
  notebookId sensitivityLabelDisplayName sensitivityLabelId sensitivityLabelName
  sensitivityLabelToolTip sensitivityLabelParentDisplayName
  sensitivityLabelParentName sensitivityLabelParentTooltip
  sensitivityLabelIsCopyBlocked teamStatus spaceAdminSettings visibility
  topics threadVersion lastContentMessageTime identityMaskEnabled
  lastL2MessageIdNotFromSelf parentId clientUpdateTime isMigrated chatSubType
  conversationId replyChainId latestDeliveryTime parentMessageVersion
  messageMap dedupeKey parentMessageId searchKey edittime skypeGuid
  isConversationLastMessage isConversationLastMessageSanitized
  originalNonLieMessage hasAnnotated messageSearchKey
]).freeze
BOOLEAN_FIELDS =

Fields that have boolean or numeric values (not strings). When we see these, the next string is NOT their value — it’s the next field.

Set.new(%w[
  isSanitized isModerator isFollowing isReader channelOnlyMember
  explicitlyAdded hasMessageDraft ispinned isfollowed collapsed
  isGeneralChannelFavorite favDefault isdeleted isMaxMemberLimitExceeded
  isTeamLocked isUnlockMembershipSyncRequired isPlainTextConvertedToHtml
  gapDetectionEnabled dynamicMembership identityMaskEnabled
  sensitivityLabelIsCopyBlocked isMigrated prioritizeimdisplayname
  prioritizeImDisplayName isConversationLastMessage
  isConversationLastMessageSanitized hasAnnotated
]).freeze

Class Method Summary collapse

Class Method Details

.consume_field(strings, idx, str, target, past_last_message) ⇒ Object

Consume one field token from the strings array and return how many positions to advance.



132
133
134
135
136
137
138
139
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 132

def self.consume_field(strings, idx, str, target, past_last_message)
  if KNOWN_FIELDS.include?(str)
    consume_known_field(strings, idx, str, target)
  else
    target['content'] = "#{target['content']}#{str}" if target.key?('content') && html_fragment?(str) && !past_last_message
    1
  end
end

.consume_known_field(strings, idx, str, target) ⇒ Object



141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 141

def self.consume_known_field(strings, idx, str, target)
  return 1 if BOOLEAN_FIELDS.include?(str)
  return 1 if idx + 1 >= strings.length

  value = strings[idx + 1]
  if KNOWN_FIELDS.include?(value)
    1
  else
    target[str] = value
    2
  end
end

.extract_strings(data) ⇒ Object

Extract ordered string array from a binary IDB value.



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 76

def self.extract_strings(data)
  strings = []
  pos = 0

  while pos < data.bytesize
    if data.getbyte(pos) == 0x22
      str, new_pos = read_length_prefixed_string(data, pos + 1)
      if str
        strings << str
        pos = new_pos
        next
      end
    end
    pos += 1
  end

  strings
end

.html_fragment?(str) ⇒ Boolean

Check if a string looks like an HTML fragment (split from content field).

Returns:

  • (Boolean)


155
156
157
158
159
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 155

def self.html_fragment?(str)
  str.include?('<') || str.start_with?('http') ||
    str.match?(/\A(width|height|alt|id|itemid|src|href|target|rel|style)/) ||
    str.match?(/\A[a-z]+=/)
end

.parse_conversation(strings) ⇒ Object

Parse a conversation record into a structured hash. Uses known field names to correctly pair keys with values, handling boolean fields (no string value) and fragmented HTML content.



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 98

def self.parse_conversation(strings)
  fields = {}
  last_message = {}
  in_last_message = false
  past_last_message = false

  i = 0
  while i < strings.length
    str = strings[i]

    # Detect section boundaries
    if str == 'lastMessage'
      in_last_message = true
      i += 1
      next
    end

    if in_last_message && %w[members botMembers rosterVersion rosterSummary
                             nonFilteredLastMessageTimeUtc __typename
                             localClientId parentId clientUpdateTime].include?(str)
      in_last_message = false
      past_last_message = true
    end

    target = in_last_message ? last_message : fields

    advance = consume_field(strings, i, str, target, past_last_message)
    i += advance
  end

  { fields: fields, last_message: last_message }
end

.read_length_prefixed_string(data, pos) ⇒ Object



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 161

def self.read_length_prefixed_string(data, pos)
  return nil if pos >= data.bytesize

  len = data.getbyte(pos)
  return nil unless len&.positive?

  if len < 0x80
    str_start = pos + 1
    actual_len = len
  else
    next_byte = data.getbyte(pos + 1)
    return nil unless next_byte

    actual_len = (len & 0x7F) | (next_byte << 7)
    str_start = pos + 2

    if next_byte >= 0x80 && pos + 2 < data.bytesize
      third = data.getbyte(pos + 2)
      return nil unless third

      actual_len = (len & 0x7F) | ((next_byte & 0x7F) << 7) | ((third & 0x7F) << 14)
      str_start = pos + 3
    end
  end

  return nil if actual_len <= 0 || actual_len > 1_000_000
  return nil if str_start + actual_len > data.bytesize

  str = data.byteslice(str_start, actual_len)
  str.force_encoding('UTF-8')
  return nil unless str.valid_encoding?

  [str, str_start + actual_len]
end