User:Brian McNeil/Project INDECT/Libraries

Outline

edit
  1. There is an expectation - if not outright assumption - that real-world libraries and privately owned 'semi-public' places like bars and coffee shops are not actively monitored
    1. Prior to affordable, readily-available computer technology, all library records were on-paper and manually maintained. I have no idea if historical records were maintained such as a list of people who borrowed a specific book. The visible system I experienced as a child in the 1970s was physical cards in each book, and each library member having a set of small folders into which the book cards were placed. The pair, forming a record of a book that has been borrowed, are filed under the due return date which is stamped into a card glued into the book. The clear aim of such a system is to have manually maintained system to ensure it is possible to try and get books returned.
    2. Visible computerisation began in 80s-90s.
    3. Current state of computerisation: All records are computerised and on-line. The public may browse a set of these records, search books &c. For any specific book they can see if it is available or out, and when due returned. They may file a request to have the book next.
      • Library staff (as related to me by an ex-employee of a college library) Can: perform all checks as available to the public. In addition, they may review the lending history of any book and see the list of library members who have taken this book out of the library; they may review the borrowing history of any library member.
    4. Bars and coffee shops are defined here as 'semi-public' spaces; the responsibility of the private owners is to keep customers returning and spending money. For bars, this frequently means employing bouncers who may liaise with police.
      1. Any speech or discussion in such a place is not readily available to anyone not present at that specific point in time.
      2. Current security services action/monitoring of such places generally relates to investigation of clearly identified suspects.
      3. Uninvolved patrons could actively report something they considered a danger, a threat, or plotting to commit a crime.
      4. CCTV is becoming more common. There may be historical records of who visited a place, and who they spoke with. Audio monitoring is less likely - additional cost to owner of premises, frequently likely to be useless data due to noise levels, use of directional microphones requires active control [Aside: In true iPhone style, you can probably say "there's an app for that" within Project INDECT].
  2. Online is considerably different: Many parties collect records, this may include records kept by the user's ISP - it is not limited to the site you use as an online substitute for a library or coffee shop.
    1. Wikipedia is - for many - the most useful online substitute for a library. The Wikimedia Foundation have a robust privacy policy, and work from a philosophical viewpoint of the least-detailed records required to identify abuse and eliminate or reduce it's threat to the website.. Publicly-visible article histories are available; this is a requirement to fairly attribute copyright to the article authors, those who have registered a pseudonym cannot publicly be linked to any IP address; if they provide an email address, then only the WMF has that data.
    2. Facebook and other social networking sites are the online coffee shops and bars. As for-profit private entities they have additional uses for user browsing habit data - the most obvious being targeted marketing. Such sites frequently include convenient equivalents to more established methods of sending email, eg: Facebook allowing you to send private messages to friends.
      1. Data retention of such sites is governed by commercial concerns, and any legislation on duration of data retention, plus the level of detail retained.
    3. Use of encryption is the exception, not the rule. This means the ISP has access to every single byte of data you exchange with these sites.
      1. ISP data retention is - again - governed by commercial concerns, and any legislation on duration of data retention, and level of detail retained.
      2. The UK has either implemented, or plans to implement, legislation requiring various records be retained by ISPs. Due to vocal resistance by ISPs, the level of detail retained is likely to be: For email: Who you emailed when, who emailed you when. For browsing: What sites you visited (possibly what pages you looked at), most convenient and efficient would be retention of all requests for a web page (the http request history). The opposition to more detailed records is based on the cost of a storage and archiving system that has no real commercial benefit.

Encrypted versus Unencrypted

edit
  1. Email encryption is well-established, although not widely used.
    1. PGP was released in 1991. With it, the content of any email may be encrypted such that only the intended recipients can read it. While who talks to who is available and in the clear, what they say is a secret. There is a backdrop of considerable research into secure communications over an insecure network.
  2. Encrypted web traffic is less-common; the option is very frequently not even available to a site's users.
    1. Addresses starting http:// are unencrypted, those starting https:// are encrypted.
    2. The public at large have a limited understanding of this. They know this should be mandatory for online banking and e-commerce.
    3. Earlier US restrictions on strong encryption export (it was classed as a 'munition') had to be relaxed. This was a prerequisite for widespread adoption of e-commerce.
    4. The publicly-known state-of-the-art is that https://, with the strongest supported key strength, is unbreakable. This may exclude extremely well-funded adversaries such as the NSA. However, 'convenient' cracking of message content would require an extremely difficult mathematical breakthrough. (P = NP/P != NP - See here). This same problem is what is assumed to make PGP secure
  3. Unencrypted web traffic - http://
    1. Your ISP has access to all of it.
    2. Any eavesdropper between you and a site you visit has access to all of it.
    3. The site you visit has access to all of it.
  4. Encrypted web traffic - https://
    1. Your ISP has access to what sites you visit, when you visit them, but knows nothing of the content exchanged with the site. This includes the specific pages you visit.
    2. Any eavesdropper between you and a site you visit has the same as the ISP, but lacks subscriber records to match an IP address to a name, address, credit card &c.
    3. The site you visit - and only the site you visit - has access to the same quantity and quality of data as when using an unencrypted connection.
  5. Encryption has a cost, it is computationally expensive. A web server providing a fully encrypted service can either support less users, or be upgraded with expensive dedicated crypto hardware.

The influence of Project INDECT

edit
  1. Project INDECT is - in general - building tools that are extremely useful in analysing the above data.
  2. The private sector and Universities are involved - many research papers are available.
  3. Users of our aforementioned online libraries and coffee shops will find advances in search technology very useful.
  4. Implementation of Project INDECT would, per the EU report on ECHELON, only be legal in a national security or criminal investigation context
  5. There is a strong indication that an implementation of Project INDECT would be subject to usage-creep, and an ever-wider set of data records fed into it. [From a security service perspective: We can add suspects to such a system at-will, each suspect needs their web of contacts analysed and possibly added, legal checks-and-balances (such as requiring a warrant) will significantly slow investigations. Conclusion: We want everything].

Encryption's impact on INDECT or ECHELON

edit
  1. Detailed data must be retrieved from the online library or coffee-shop.
  2. This is a clear point where checks-and-balances can be enforced.
  3. 'Fishing expeditions', such as who read or contributed to an article on Marxism, cannot readily be carried out.
  4. Data retention requirements legislatively applied to ISPs result in significantly less-valuable data being kept
    • "Ideal" access by security services would require global legislation on data retention, disclosure, and sharing.
    • Moves to legislatively provide "ideal" access and monitoring would be extremely difficult to get past public scrutiny.
  5. Interception and eavesdropping between citizen and website is significantly less useful, and provides no communication content
  6. Selective content filtering is extremely difficult, and has a high risk of being discovered (See IWF attempt to block Virgin Killer on Wikipedia; even if the IWF had added the appropriate page on the secure Wikipedia site to their blacklist, the filtering technology cannot see a request for that page. The entire site must be blocked to censor one page.)

Alternatives

edit
  • Invariably involve encryption - Tor likely best example, based on U.S. Navy research
  • Tor - advantages
    • Data is encrypted multiple times - layers, like an onion
    • Requests pass through three randomly chosen Tor nodes
    • Each Tor node only knows the source and destination of the encrypted data
    • Each Tor node decrypts the 'onion layer' meant for it, and forwards it to the next Tor node
    • When the now-unencrypted data leaves the last Tor node - the end-point - It, or an observer, cannot tell the origin of the data.
    • When the website responds, it responds to the Tor end-point/exit node
      • The end point has data from the user specifying how to encrypt the response, and encrypted instructions for the return route; it only knows the first tor node after it on the data's return journey
    • Breaking Tor would require compromising all three computers making up the circuit through the Tor network
    • A user may reset Tor at any point, thus forcing use of a different set of Tor nodes
    • The response goes back through the Tor network to the user, fully encrypted
    1. Your ISP only knows you use it, they no longer know what sites you are visiting
    2. Any in-the-clear data is from a Tor exit-node. The node, and observers between it and the visited site, can only tell the actual user's identify if the unencrypted content contains it, or links the communication with others that contain it
    3. The Tor exit-node may be outside the jurisdiction of a user's potential adversaries
  • Tor - disadvantages
    1. Data still travels in-the-clear for part of the journey
    2. Overhead per-user running encryption software, and data-scrubbing software to remove identifying information from data travelling over Tor
    3. Slow, very slow - completely unacceptable for most javascipt-heavy Web 2.0 sites or streaming video
    4. The route taken across the Internet when using Tor will in most cases be at least four times longer than the average direct route
    5. Many sites block Tor in some way - usually for abuse prevention (Eg Wikipedia blocks editing via Tor, but not reading)
edit

Filesharing

edit

WinNY

edit
 
Wikipedia
Wikipedia has more about this subject:
  • WinMX/WinNY
    WinMX was, at one time, the most popular file-sharing program in Japan - also well-known outside .jp.
    In response to a rash of reports on WinMX users being prosecuted, the developer created WinNY
    WinNY anonymises the traffic - monitoring will no longer net individuals to prosecutions
    WinNY developer prosecuted, found guilty - not clear law/charges
    guilty judgement overturned by higher court - with a bit of a 'slap' to the lower court

Bittorrent

edit
  • BitTorrent - most widespread/commonly used p2p software.
    Developed for speed, not anonymity
    Relied upon by TPB