orbiter
a196f24f60
prevent enqueueing of non-loggeable logging entries
13 years ago
orbiter
482afed07c
reduced logging overhead (a bit)
13 years ago
orbiter
e76159040b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter
bbfa497a3c
replaced more size() > 0 by !isEmpty()
13 years ago
Michael Peter Christen
58e7d1952f
reduction of logging to prevent too much IO caused be logging
13 years ago
Michael Peter Christen
83da68c4c1
fixed a memory leak inside the logger which appeared if the log was
...
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.
13 years ago
orbiter
0cbda0b2b8
- replaced all length() == 0 and size() == 0 with isEmpty()
...
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
13 years ago
orbiter
28b30231c3
fix for url matcher of multiple amp& in an url, see:
...
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650
13 years ago
Roland 'Quix0r' Haeder
aef9dd0350
- removed cleaning of blacklist cache on startup
...
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
13 years ago
orbiter
c7afa8bc48
using SwitchboardConstants for solr attributes
13 years ago
orbiter
c6d8950651
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter
5f3b8dc040
fix for RSS reader
13 years ago
orbiter
62202e2d71
refactoring of query attribute variable names for better consistency
...
with (next) stored query words
13 years ago
Michael Peter Christen
1addbc792c
use less memory for md5 cache
13 years ago
Michael Peter Christen
f32de94723
more logging
13 years ago
Michael Peter Christen
d09d9f2364
filter old peers from bootstrap (now stronger: 60 minutes instead of
...
240).
13 years ago
Michael Peter Christen
434ee90c59
added classification for control file types which shall not be loaded
...
but placed onto the noload-queue
13 years ago
Michael Peter Christen
a90bcb48f6
added webm
13 years ago
Michael Peter Christen
801972fe6f
fix for url camel case parser and sentence reader
13 years ago
Michael Peter Christen
fbc1a2030d
fix for sitemap importer: can now also import very large sitemaps within
...
small memory configurations
13 years ago
Michael Peter Christen
92731e5287
fix for sevenzip parser
13 years ago
Michael Peter Christen
45641b0c23
catch and log a warning in RasterPlotter
13 years ago
Michael Peter Christen
8efc1c1078
- fixed a memory leak (or bad usage) during parsing/snippet fetch
...
- more logging for errors
13 years ago
Michael Peter Christen
c3db015410
prevent loading of content from the cache when retrieval with IFFRESH is
...
used and cache is stale. Should speed up snippet generation when cache
strategy is IFFRESH.
13 years ago
Michael Peter Christen
b1e7c11fba
fix for pattern matcher in html parser
13 years ago
Michael Peter Christen
8a6edc0031
fix for solr shutdown
13 years ago
Michael Peter Christen
b8bcc06283
fix for urls beginning with "//"
13 years ago
Michael Peter Christen
b0c408788b
made class methods static where possible
13 years ago
Michael Peter Christen
5bd3c90907
- removed unnecessary semicolons
...
- added default case for switch
13 years ago
Michael Peter Christen
132afaf687
removed unaccessible code
13 years ago
Michael Peter Christen
7c1ba99755
removed more unused method parameters
13 years ago
Michael Peter Christen
83701a1b4c
removed unused ImageReference package
13 years ago
Michael Peter Christen
0301aba1e9
removed unused method parameters
13 years ago
Michael Peter Christen
241dd8410a
removed snippet pattern filter - it was not used
13 years ago
Michael Peter Christen
d3964253ae
- added @SuppressWarnings to unused servlet method parameters
...
- removed unnecessary casts
- removed unnecessary throw statements
13 years ago
Michael Peter Christen
ea10766bfd
cleaned unnecessary nested code
13 years ago
Michael Peter Christen
1481037820
replaced non-generic array with collection
13 years ago
orbiter
fc0f9543fe
More SentenceReader cleanup
13 years ago
orbiter
586bb0eb6a
Simplified SentenceReader (no more Reader inside..)
13 years ago
orbiter
7f851d62a7
replaced HashARC with SizeLimited Objects which are less costly
13 years ago
orbiter
d4291ac1f3
more tolerance when creating solar document
13 years ago
orbiter
78fc3cf8f8
refactoring and new usage of SentenceReader: this class appeared as one
...
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
13 years ago
orbiter
bb8dcb4911
automatically adopt size of word cache to available memory
13 years ago
Michael Peter Christen
ad09b786bf
clean up parser data
13 years ago
Michael Peter Christen
276a66a793
Adding a limit of 1000 links that a parser shall store during indexing.
...
A limit was necessary because some web pages have such huge numbers of
links that it can easily cause a OOM just by the number of links.
The quesion if the number of 1000 links is sufficient or too weak must
be answered with the result of testing this feature.
13 years ago
Michael Peter Christen
613b45f604
- better data structures in secondary search
...
- fixed a big memory leak in secondary search
13 years ago
Michael Peter Christen
de903a53a0
parser refactoring & hacks
13 years ago
Michael Peter Christen
8a82609360
- smaller caches to save memory
...
- close cloneable iterators to free memory
13 years ago
Michael Peter Christen
7249d9c9de
bugfix for concurrent seed loader
13 years ago
Michael Peter Christen
c72d3b12cd
concurrently initialize the seed list during p2p network bootstrap
13 years ago
Michael Peter Christen
1825f165b8
better integration of blacklist according to use case
13 years ago
Michael Peter Christen
c18fa9fa75
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
13 years ago
Michael Peter Christen
ce8d4b87d9
fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen
0c345d1559
giving threads name so its easier to see whats happening during
...
debugging and within a thread dump
13 years ago
reger
067728bccc
add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages)
13 years ago
Michael Peter Christen
03280fb161
removed segments-concept and the Segments class:
...
the segments had been there to create a tenant-infrastructure but were
never be used since that was all much too complex. There will be a
replacement using a solr navigation using a segment field in the search
index.
13 years ago
Michael Peter Christen
508a81b86c
added solr field 'refresh_s' which stores the refresh url contained in
...
the meta-refresh html header field.
13 years ago
Michael Peter Christen
f3167def64
do not fill the keywords with title content if keywords do not exist.
13 years ago
Michael Peter Christen
9116013c64
- allow lazy initialization of solr value (if using 'lazy', then no
...
0-values and no empty strings are written). This may save a lot of
memory (in ram and on disc) if excessive 0-values or empty strings
appear)
- do not allow default boolean values for checkboxes because that does
not make sense: browsers may omit the checkbox attribute name if the box
is not checked. A default value 'true' would not comply with the
semantic of the browsers response.
- add a checkbox in IndexFederated_p for the lazy initialization of solr
fields.
13 years ago
sixcooler
97f60010d8
fix crawl start from file
13 years ago
Michael Peter Christen
0294a53459
- add canonical field only if requested by solr schema
...
- remove canonical url from in/outbound urls if present
13 years ago
Michael Peter Christen
3fd4a01286
added option to record urls that are forwarded to the solr index
13 years ago
Michael Peter Christen
d763e4d94b
fixed bad referer computation in SSIs which causes a NPE during host
...
computation. This error was there before the latest IPv6 hack but did
not cause a NPE. The IPv6 hack was not the cause for this bug, but it
discovered the misconfiguration of the 'referer' referrer.
13 years ago
Michael Peter Christen
358b04885e
more IPv6 hacks
13 years ago
Michael Peter Christen
96aeb127e3
generalized localhost naming.
...
this is also a preparation for a better IPv6 implementation.
13 years ago
Michael Peter Christen
77f795756c
fixing redirects and status codes: storing of status code in
...
ResponseHeader to make it available for late evaluations, like storage
in solr.
13 years ago
Michael Peter Christen
8dd469b9dd
added option to configure the autocommit delay time of solr on-the-fly
13 years ago
Michael Peter Christen
b9dfca4b0a
- fixed IndexFederated Servlet / a embedded Solr can now be selected
...
- added code stub for an embedded Solr but generation of Solr store is
still commented out (it works but is not yet ready for usage)
13 years ago
Michael Peter Christen
fad3b14813
added jetty libraries, needed for future use as web server and as
...
application server for the solr search interface
13 years ago
Michael Peter Christen
a38b0a2c46
extended embedded solr tests to ensure that it will be usable within a
...
jetty instance
13 years ago
Michael Peter Christen
b9d42fd9c8
using com.google.common.io.Files instead of homebrew methods
13 years ago
Michael Peter Christen
a5eb91fa60
refactoring
13 years ago
Michael Peter Christen
1be0025a9c
- added test for EmbeddedSolrConnector
...
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
13 years ago
Michael Peter Christen
dbdd697f4d
moved RDFaParser.xsl configuration file to defaults
13 years ago
Michael Peter Christen
90b82ce994
using guava for host resolution (non-blocking for ips) and time-out
13 years ago
Michael Peter Christen
e12bb254b4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
3f55dc7c1e
- added solr core and libraries that solr needs (lucene is missing, will
...
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
13 years ago
Michael Peter Christen
c337190a00
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
cominch
c63c3a4495
Show additional interaction elements in footer section on each page, if
...
activated in ConfigPortal.html.
This footer is also visible in augmented browsing proxy mode.
13 years ago
Michael Peter Christen
786be7d175
better integration of RDFaParser
13 years ago
Michael Peter Christen
de3ef8ad73
removed unimportant warnings
13 years ago
Michael Peter Christen
82a682b31d
fixed problem with seed when switching network
13 years ago
Michael Peter Christen
8c544edee4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
7dc59979bc
fix for npe, possibly for http://bugs.yacy.net/view.php?id=195
13 years ago
Michael Peter Christen
1d4e206b2b
bugfix in vocabulary generation
13 years ago
cominch
2c89975378
Merge remote-tracking branch 'original yacy/master'
13 years ago
Michael Peter Christen
52f5d40043
better abstraction of document model generation
13 years ago
Michael Peter Christen
8b7c4d3144
produce a rdf output containing the triplestore with yacydoc; ie:
...
http://localhost:8090/api/yacydoc.rdf?urlhash=yOiCM7Fh1hyQ
13 years ago
cominch
f7160dae5c
Merge remote-tracking branch 'original yacy/master'
13 years ago
cominch
e4555cbee3
Augmented browsing: Pass on additional action parameter
13 years ago
Michael Peter Christen
24bbe359ca
integrate also geonames library files for less cities. these are more
...
useful for tagging since less normal words are false-identified as
location
13 years ago
Michael Peter Christen
223a5440ab
preventing that an empty pnd is inserted into the vocabularies
13 years ago
Michael Peter Christen
8e97ada7c9
IPv6 bugfix
13 years ago
Michael Peter Christen
963f92ed9a
- merged files
...
- changed behaviour of delete button in vocabulary edit
- fixed size numbe in vocabulary listing
13 years ago
Michael Peter Christen
dd88d0ace2
more logging
13 years ago
Michael Peter Christen
94d54e2d91
added recognition of multi-word terms in vocabulary matching
...
this makes the PND usable: it is now possible to recognize persons and
navigate with a 'Persons' facet.
13 years ago
Michael Peter Christen
64c0268b2b
show triplestore metadata in yacydoc and viewfile
13 years ago
Michael Peter Christen
0fbd749207
ipv6 update
13 years ago
Michael Peter Christen
c2f0d16d2c
fixed vocabulary initialization
13 years ago
Michael Peter Christen
fbded1f466
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
df3531f8d5
added the generation of virtual vocabularies using the pnd
13 years ago
Michael Peter Christen
e806106b10
jquery bugfix
13 years ago
Michael Peter Christen
a0f1decd82
- added loading of the dbpedia pnd triplestore in the dictionary loader
...
- renamed the dictionary loader to knowledge loader
- some refactoring in the library provider method names
13 years ago
cominch
2ac7a5c1f2
Augmented browsing: Add overlay bar which shows the vocabulary tags
13 years ago
cominch
3c255c025b
Show tags in search results (if activated in ConfigPortal_p.html)
13 years ago
Michael Peter Christen
16d8f33795
added objectlink generation to vocabulary generation and editor
13 years ago
cominch
f49d92d8da
Cleanup of interaction class and helper routines
13 years ago
cominch
56b0115054
Triplestore: modify routines to access per user store
13 years ago
cominch
a95127c9af
Triplestore: initalize per-user triplestores
13 years ago
Michael Peter Christen
d45718251e
refactoring (Localization -> Location)
13 years ago
Michael Peter Christen
b8b3c87ba7
- renamed localization to location (that was confusing)
...
- renamed 'Locale' navigator to 'Location'
- produce Location navigation only if geolocation libraries are loaded
13 years ago
Michael Peter Christen
e89747bb67
- added automated generation of vocabularies from url stubs
...
- added clear of all terms for vocabularies
- added deletion of vocabularies
13 years ago
Michael Peter Christen
79464189a4
The 'Locale' vocabulary, which is generated by geo data, has now the
...
objectspace "http://dbpedia.org/resource/ "
13 years ago
Michael Peter Christen
eca38c53e7
added a vocabulary editor
13 years ago
Michael Peter Christen
61bb52d55c
- using http://purl.org/dc/terms/references to refer from an
...
auto-annotated document to a 'pseudo-linked' document which has an url
created with an object-prefix as defined in the vocabulary file
13 years ago
Michael Peter Christen
2bbb6c52cf
added option to clean the triplestore when deleting the index
13 years ago
Michael Peter Christen
50c576599b
allow multiple parser options instead of printing an error
13 years ago
Michael Peter Christen
c02d742e53
proper namespaces in triplestore dump
13 years ago
Michael Peter Christen
8b53771db2
changed behavior of navigation processing:
...
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
13 years ago
Michael Peter Christen
5fc6524ca8
- moved triple store to net.yacy.cora.lod (should be generalized there
...
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
13 years ago
cominch
8d2e6355f8
augmented browsing: remove non-existing external snippet file
13 years ago
cominch
c90f174799
preparation and generalization of augmented browsing methods
13 years ago
Michael Peter Christen
bef823c247
close the reader if finished
13 years ago
Michael Peter Christen
4ee6fb1de9
added missing blacklist dht cache storage (maybe due to mistakes in
...
cherry picking)
13 years ago
Roland 'Quix0r' Haeder
e4d36fa5eb
Fix to make all values lower-case (this should make all existing blacklists compatible with the new enum)
13 years ago
Roland 'Quix0r' Haeder
edaa09b9b1
Rewrote all String blacklist types to enum 'BlacklistType', closes bug
...
#143
Conflicts:
htroot/Supporter.java
htroot/yacy/crawlReceipt.java
htroot/yacy/transferRWI.java
htroot/yacy/transferURL.java
source/de/anomic/crawler/CrawlStacker.java
source/de/anomic/data/ListManager.java
source/net/yacy/peers/Protocol.java
source/net/yacy/repository/Blacklist.java
source/net/yacy/repository/LoaderDispatcher.java
source/net/yacy/search/Switchboard.java
source/net/yacy/search/index/MetadataRepository.java
source/net/yacy/search/index/Segment.java
source/net/yacy/search/query/RWIProcess.java
source/net/yacy/search/snippet/MediaSnippet.java
13 years ago
Roland 'Quix0r' Haeder
af5a597e47
Scroogle is not comming back, remove dead code
...
Conflicts:
source/net/yacy/search/Switchboard.java
13 years ago
cominch
7a4dab6d1d
- removed unused variables
...
- do not replace malformed or invalid URLs in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835
6c8d7289-2bf4-0310-a012-ef5d649a1542
Conflicts:
source/de/anomic/http/server/HTTPDFileHandler.java
13 years ago
Michael Peter Christen
ca93835713
removed usage of deprecated methods
13 years ago
Michael Peter Christen
23e38bd918
do not load the "_triplestore.rdf" files which are of special use in
...
sciencenet
13 years ago
Michael Peter Christen
90c6fc4b63
load all - but not the persistent local.rdf - triples from
...
DATA/TRIPLESTORE at startup time. The local.rdf is loaded only if the
persistent switch is on (as before).
13 years ago
cominch
bbfc53b663
bugfix
13 years ago
cominch
65c5826d93
bugfix
...
Conflicts:
source/net/yacy/document/parser/augment/AugmentParser.java
13 years ago
cominch
aa0295917c
augmentation
...
Conflicts:
source/net/yacy/interaction/AugmentHtmlStream.java
13 years ago
cominch
ed2ea0f08e
augmented browsing modification
...
Conflicts:
htroot/interaction/OverlayInteraction.html
source/net/yacy/interaction/AugmentHtmlStream.java
13 years ago
cominch
6b32f7c1f6
re-enable augmented proxy
13 years ago
cominch
3b08edec2e
bugfix
...
Conflicts:
source/net/yacy/interaction/AugmentHtmlStream.java
13 years ago
cominch
5f8ba7f4f2
small changes
...
Conflicts:
source/net/yacy/document/parser/augment/AugmentParser.java
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
300b235ce8
Updated Demo Servlet
...
Conflicts:
htroot/About.html
htroot/DemoServlet.html
htroot/DemoServlet.java
htroot/interaction/interaction.js
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
90512640bf
Added config switches for custom parser
...
Conflicts:
source/net/yacy/document/TextParser.java
13 years ago
cominch
b5a8fb5fd8
Catch malformed URL when submitted in encoded style
13 years ago
cominch
df47f31235
interaction: add special table interaction
...
Conflicts:
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
e14f2881ae
interaction: add special table interaction
...
Conflicts:
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
d7326079a8
interaction: add global variable store
...
Conflicts:
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
4e4e7a99f8
interaction: add global variable store
...
Conflicts:
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
8e80894812
create virtual web folder /currentyacypeer/ which always points to local
...
peer, even when using the urlproxy
Conflicts:
source/de/anomic/http/server/HTTPDProxyHandler.java
13 years ago
cominch
bde07ed7a8
Add tagging overlay element
...
Conflicts:
htroot/env/templates/jqueryheader.template
htroot/yacysearchitem.java
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
b0bc0b4572
Add new demonstration module for client-side key-value store (backend:
...
triplestore): /DemoServletInteraction.html
Conflicts:
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
c9dc6cda02
Demonstration: include value from interaction in search results
...
Conflicts:
htroot/interaction/OverlayInteraction.html
htroot/yacysearchitem.java
13 years ago
cominch
ae8adb0e58
Small changes
13 years ago
cominch
bcbd8eee33
Add several parsers, for RDFa and rdf files.
...
Conflicts:
source/net/yacy/document/TextParser.java
13 years ago
cominch
9ef5a80f4e
add interaction for triples and selector for augmented browsing
...
Conflicts:
htroot/interaction/interaction.js
source/net/yacy/interaction/Interaction.java
13 years ago
cominch
282c1620d6
Allow TripleStore to be persistent after reboot
13 years ago
cominch
5d20cd324a
Add Triplestore and RDF query interface
...
Conflicts:
build.xml
defaults/yacy.init
source/net/yacy/interaction/AugmentHtmlStream.java
13 years ago
cominch
bc9a618e0a
augmented browsing: ignore js and css, integrate more user interaction
...
Conflicts:
htroot/interaction/Footer.html
source/net/yacy/interaction/AugmentHtmlStream.java
13 years ago
cominch
b21048892b
augmentedParser add features and integrate external html parser to
...
modify existing web pages
Conflicts:
addon/YaCy.app/Contents/Info.plist
build.xml
13 years ago
cominch
9cbfc1a1c0
augmentedProxy, which forwards every proxy request to a
...
rewrite engine to customize existing webpages. originally implemented by
Florian Richter.
Conflicts:
source/de/anomic/http/server/HTTPDProxyHandler.java
13 years ago
Michael Peter Christen
3b992e6b00
using utf8 String compression in Webstructure database
13 years ago
Michael Peter Christen
26301a538d
bugfix in Domains - dns-lookup
13 years ago
Michael Peter Christen
cde20911bb
saved a bit more ram using UTF8 String compression for OpenGeoDB and
...
Geonames data files.
13 years ago
Michael Peter Christen
225ee42879
made the GeoLocation into an interface with the current
...
integer implementation as accuracy implementation of 1.863cm
13 years ago
Michael Peter Christen
2280a7b276
- changed initialization order to prefer allocation of memory for table
...
files first
- bugfixes in memory amount calculation
13 years ago
Michael Peter Christen
0746308bc2
only the metadata tables shall be able to use the tail cache
13 years ago
Michael Peter Christen
7ec9bef0c3
fix for OOM
13 years ago
Michael Peter Christen
41c02cb10e
- less restrictions for usage of Table RAM copy
...
- new limit to use the table copy (instead of flag): 400MB available. If
less is available, then a copy is never used. If more is available, then
it can be used if there is a remaining space of at least 200MB
- flush caches more often: flush the Digest cache
13 years ago
Michael Peter Christen
b8f56a9803
npe bugfix
13 years ago
Michael Peter Christen
dd14b19c26
lazy initialization of block rank table ... only normal web search uses
...
this. When interactive search or location search is used, the block rank
is switched off
13 years ago
Michael Peter Christen
ba10caf89a
lazy initialization of database tables
13 years ago
Michael Peter Christen
701b9a28a0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
htroot/PerformanceMemory_p.java
13 years ago
Michael Peter Christen
ab7107b34b
fixed RWIProcess queue limits: now discovering hidden results for mass
...
result retrieval
13 years ago
Michael Peter Christen
10c9c17d51
fixed handlemap spread factor and null iterator handling
13 years ago
Michael Peter Christen
b0095c8d3c
flush the compressor cache when a cleanup is done
13 years ago
Michael Peter Christen
a61f44f9e4
lazy initialization of block rank table.
...
this causes that the table is not initialized when there is no search is
done. the effect is most strong if YaCy is started headless which causes
no browser pop-up which otherwise would load the search page and
therefore trigger the initialization of the table.
13 years ago
Michael Peter Christen
96e9d77270
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java
13 years ago
Michael Peter Christen
00f2df1120
a variety of possible memory leak fixes
13 years ago
Michael Peter Christen
3dd8376825
added automatic cleaning of cache if metadata and file database size is
...
not equal. It might happen that these data is different because one of
that caches is cleaned after a while or when it is too big. The metadata
is then not cleaned, but now wiped after a checkup process at every
application start. This should cause a bit less memory usage.
13 years ago
Michael Peter Christen
d0ec8018f5
fixes for bad long computation
13 years ago
Michael Peter Christen
6bb07afcc3
accept also files with other file prefix; used to read 'foreign' cache
...
files
13 years ago
Michael Peter Christen
96c8119b50
added GeoLocation / GeoPoint classes which uses less memory than
...
Location/Coordinates and has initializers with correct order of lat,lon
coordinates
13 years ago
Michael Peter Christen
461a0ce052
removed warnings
13 years ago
Michael Peter Christen
62ae9bbfda
allow more POIs, get more at once
13 years ago
Michael Peter Christen
407fdf6968
more bug fixes and performance hacks for search process
13 years ago
Michael Peter Christen
a1fe65b115
performance hacks
13 years ago
Michael Peter Christen
2fe207f813
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
5aee19daa4
added show from cache in search results (not yet finished)
13 years ago
Michael Peter Christen
5e562dcdb7
adopted vocabulary usage within anotation/naviagtion feature of search
...
to new SimpleVocabulary class
13 years ago
Michael Peter Christen
514700291a
moved Vocabulary to cora package (added in git
...
964406ad17
)
13 years ago
Michael Peter Christen
0284a4d88f
more fixes for double precision of coordinates
13 years ago
Michael Peter Christen
964406ad17
added concurrency enhancement to xml parser
13 years ago
Michael Peter Christen
240045cf7c
fix for bad distance computation
13 years ago
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
13 years ago
Michael Peter Christen
7a329465b3
using pre-compile pattern in blacklist; should enhance search speed
13 years ago
Michael Peter Christen
6e83b02b83
- bugfix for surrogate file reader
...
- bugfix for location search: suppress empty search
13 years ago
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
13 years ago
Michael Peter Christen
834dc6b263
store more data from interface access
13 years ago
Michael Peter Christen
1f48d1528b
performance hacks
13 years ago
Michael Peter Christen
c70aaccdc9
better location to generate a guid for rss messages
13 years ago
Michael Peter Christen
10da7335ea
performance hack: use a hash cache for all hashes that are computed by a
...
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
13 years ago
Michael Peter Christen
f8a0cf6d7c
RSSMessages do not need a concurrent hash map -> removed overhead
13 years ago
Michael Peter Christen
07ca7e4dd1
enhanced RSS parsing by ensuring that it is parsed with a buffered input
...
stream
13 years ago
Michael Peter Christen
7c1feefb28
introduced a default 10 second time-out in rwi normalization time
...
uring search process to prevent endless deadlocks after a very long
running search
13 years ago
Michael Peter Christen
8d997d55b6
better logging
13 years ago
Michael Peter Christen
65d37e6a20
only ASCII needed in seed bitflags
13 years ago
Michael Peter Christen
0f82fb3628
using double instead float for a better release ordering
13 years ago
Michael Peter Christen
43c2c6e588
better logging
13 years ago
sixcooler
56087c1f23
bump to httpclient- httpcore-, httpmime- 4.2
13 years ago
Michael Peter Christen
20e0cc0822
fix for bad location evaluation
13 years ago
Michael Peter Christen
71c3163f3d
- fixes to node identification
...
- added link to node in network list
- added marking of portal search node peers
13 years ago
Michael Peter Christen
4d3cc02168
replaced old bzip2 library against better documented commons-compress
...
package from http://commons.apache.org/compress/
13 years ago
Michael Peter Christen
ad222be7f8
added node state icon in network list
13 years ago
Michael Peter Christen
eff7667554
fix for http://bugs.yacy.net/view.php?id=188
13 years ago
Michael Peter Christen
3c2bec681f
added a root node flag: identifies peers with short ping time
13 years ago
Michael Peter Christen
c846e9ca14
redesign of the crawler monitor page: show crawled pages instead of
...
queue of urls that shall be crawled
13 years ago
Michael Peter Christen
8b974905ee
changed log-in text for all servlets with authentication:
...
- added hint how to set the password using a shell script
- added a shell script to change the password
13 years ago
Michael Peter Christen
16b21f7a5b
Added more steering in Crawler_p.html interface
13 years ago
Michael Peter Christen
acc19e190d
hack against 100% cpu during crawl delete
13 years ago
Michael Peter Christen
c15fcde1c8
add-on to latest commit
13 years ago
Michael Peter Christen
cf47d94888
performance hack to parse numbers inside of substrings without actually
...
generating a substring. This avoids the allocation of a String object
ech time a substring is parsed. Should affect CPU load during RWI
transmission.
13 years ago
Michael Peter Christen
7e0ddbd275
added a "fromCache" flag in Response object to omit one cache.has()
...
check during snippet generation. This should cause less blockings
13 years ago
Michael Peter Christen
81737dcb18
removed stack trace from swf parser since we cant do anything there
13 years ago
Michael Peter Christen
7bf421b9dd
- fixed image search page navigation
...
- removed some deadlocks and ConcurrentModificationExceptions during
DidYouMean collection
13 years ago
Michael Peter Christen
125d47b3c1
added more interruptions in DidYouMean because that was the cause for
...
some blockings during search
13 years ago
Michael Peter Christen
c6a09eab0b
synchronization needed
13 years ago
Michael Peter Christen
fb94b47b1a
changed queue sizes to have less memory occupied during indexing
13 years ago
Michael Peter Christen
76157dc2c3
bugfix for http://bugs.yacy.net/view.php?id=173
13 years ago
reger
6696cb1313
bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer
...
SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset.
Changed the index init to insert lowercase peer names as key
13 years ago
Michael Peter Christen
c6558cba08
more classification bugs
13 years ago
Michael Peter Christen
082831b9d6
search contentdom was checked in wrong way - fixed
13 years ago
reger
ee553d971e
correct typo in scripts_txt comment
13 years ago
Michael Peter Christen
f294f2e295
bugfix to http://bugs.yacy.net/view.php?id=181
...
tried to make a bit less 'noise' to dns server
also included: less processes in snippet fetch to reduce load during
search on small computers
13 years ago
Michael Peter Christen
acf8d521a2
fix for http://bugs.yacy.net/view.php?id=126
13 years ago
Michael Peter Christen
bb88878b4d
the last commit was incomplete..
13 years ago
Michael Peter Christen
d320a31ae1
bugfix for http://bugs.yacy.net/view.php?id=186
13 years ago
Michael Peter Christen
fa735f4f04
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
3e1bc9477f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
6f8a2fef1f
small speed enhancement using a column factory
13 years ago
Roland 'Quix0r' Haeder
d10627d591
More sync in close() methods
...
Conflicts:
source/net/yacy/kelondro/logging/GuiHandler.java
source/net/yacy/kelondro/workflow/InstantBusyThread.java
13 years ago
Roland 'Quix0r' Haeder
b3ae2aa41f
With or without 'final'? At least please try it in other methods
...
Conflicts:
source/de/anomic/tools/tarTools.java
13 years ago
Roland 'Quix0r' Haeder
fbb946f913
Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile
13 years ago
Michael Peter Christen
52d307c735
prevent that the snippet fectch process removes catchall entries
13 years ago
Michael Peter Christen
7eece0256f
moved yacy.logging to defaults according to request in
...
http://bugs.yacy.net/view.php?id=55
13 years ago
Michael Peter Christen
5b3acc12cd
Pattern.quote() replaces \\Q and \\E according to publication in
...
http://www.cs.washington.edu/homes/mernst/pubs/regex-types-ftfjp2012.pdf
13 years ago
Michael Peter Christen
89142d1e8d
removed (not all) warnings
13 years ago
Michael Peter Christen
5deebd02ea
added serialization
13 years ago
reger
b2175ea4ef
Add possibility to set custom Solr field names for the YaCy default Solr attributes.
...
- Changing the format of YaCy's solr.key.list while maintainig backward compatibility
Federated index config screens adjusted accordingly
- modified the Solr update request to use a 3 min Solr autocommit intervall
13 years ago
Michael Peter Christen
15db703808
added missing serialization to remove all warnings
13 years ago
Michael Peter Christen
1795a7325b
made HandleSet serializable
13 years ago
Michael Peter Christen
e7e381d110
added configuration to switch off redirection following in crawler
13 years ago
Michael Peter Christen
2717c1b749
fixed bug in solr interface
13 years ago
Michael Peter Christen
70505107ca
enhanced crawler/balancer: better remaining waiting-time guessing
13 years ago
Michael Peter Christen
f150bc218b
fixed bug in solr error document
13 years ago
Michael Peter Christen
cb54c1737b
solrj connector bugfix
13 years ago
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
13 years ago
Michael Peter Christen
49cab2b85f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
0d58fea210
made multiple connector default
13 years ago
Michael Peter Christen
7740c02c56
- enhanced the solr connector
...
- added new multiple connector (to replace singleConnector)
13 years ago
Michael Peter Christen
0cf3d36eae
more tolerance in case of corrupted file
13 years ago
Michael Peter Christen
acc6db28ff
added missing classes for solr interface
13 years ago
Michael Peter Christen
adeb33bb36
better abstraction for solr objects
13 years ago
Michael Peter Christen
8864141872
more abstraction in solr connection classes
13 years ago
Michael Peter Christen
c00efc2717
made the solr connection more generic
13 years ago
Michael Peter Christen
ea2bd43b28
patch for broken configurations
13 years ago
Michael Peter Christen
e5ca7f22b1
enhancement in circle drawing
13 years ago
Michael Peter Christen
34f4225d7e
less 'wellformed' calls without asserts
13 years ago
Marc Nause
a691023d04
*) better formatting for network QPM
...
*) refactoring
13 years ago
Michael Peter Christen
77f8e9fb9b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
13 years ago
Michael Peter Christen
2a0434efa4
Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'
13 years ago
Michael Peter Christen
942896fe46
removed methods not supported by new solrj connector for httpclient 4
...
Error was:
java.lang.UnsupportedOperationException: Client was created outside of
HttpSolrServer
at
org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614)
at
net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128)
at
net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55)
at net.yacy.search.Switchboard.<init>(Switchboard.java:657)
at net.yacy.yacy.startup(yacy.java:222)
at net.yacy.yacy.main(yacy.java:1018)
13 years ago
Michael Peter Christen
22e1f68c0b
solrj user authentication patch
13 years ago
Michael Peter Christen
09484955dc
added new entry class for embed tags
13 years ago
Michael Peter Christen
62f2554a01
- fixed build problems (deprecated methods using httpclient 3.1)
...
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
13 years ago
Michael Peter Christen
a6d60fc21f
concurrency enhancement in ConfigurationSet
13 years ago
Michael Peter Christen
453010bd68
- solved problems with backpath normalization
...
- redesigned in/outbound link handover
- removed iframe links from inbound/outbound in solr scheme
13 years ago
Michael Peter Christen
5f5ed33ed8
patch for media search (audio, video apps)
13 years ago
Michael Peter Christen
7860c1df80
fix needed for new solrj library
13 years ago
Michael Peter Christen
0e13022147
- enhanced solr field documentation
...
- added xml api button to IndexFederated_p - the solr schema.xml file
can be generated by YaCy
13 years ago
Michael Peter Christen
19efbf1b0f
- apply directDocByURL to NOLOAD Queue
...
- choose pushing to NOLOAD as default for site crawl
13 years ago
Michael Peter Christen
659178942f
- Redesigned crawler and parser to accept embedded links from the NOLOAD
...
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
13 years ago
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
13 years ago
Michael Peter Christen
f5efdb21fd
refactoring
13 years ago
reger
c1f6b4fb52
lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
13 years ago
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
13 years ago
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen
8a08c96a82
removed dependency from logging
13 years ago
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen
a5d7da68a0
refactoring: removed dependency from switchboard in Balancer/CrawlQueues
13 years ago
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
13 years ago
Michael Peter Christen
91a86f0b06
fixed to network graph testing
13 years ago
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
13 years ago
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
13 years ago
Michael Christen
02e4dedff2
fix to url citation collection
13 years ago
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
13 years ago
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
13 years ago
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
13 years ago
Michael Christen
22f05c83ff
fixed default must-match filter for full domain crawls - the old filter
...
was to restrictive and did not allow intranet crawls
13 years ago
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
13 years ago
Michael Peter Christen
532c7cf827
added physics experiment to the graph plotter. not active by default
13 years ago
Michael Peter Christen
aba9b1bfa0
better names for elements of a linked graph
13 years ago